Writing Bazel Rules: Module Extensions¶
This article originally appeared on jayconrod.com as part of the Writing Bazel Rules series.
It's been almost six years since the previous entry in this series was originally published, and there are many new topics to discuss!
The biggest change in the last few years was the introduction of Bazel modules (also known as Bzlmod) and the deprecation of WORKSPACE
mode. I've updated all the previous articles to be compatible with Bazel modules, but today we'll explore newly introduced functionality: how to write a module extension, and why you'd want to do so.
What is a module extension?¶
Bazel's new module system is analogous to package management systems used by other build tools like Go modules, npm packages, Maven artifacts, or Cargo crates. A module author first releases a version of their module on the Bazel Central Registry (BCR). A Bazel user can then depend on that module by listing it in their MODULE.bazel
file. Bazel evaluates each MODULE.bazel
file when it starts, beginning with the root module containing the current directory and recursing for all dependencies. It then uses minimal version selection to pick a version of each dependency to use.
Not all dependencies are Bazel modules though, and not all package management systems are compatible with minimal version selection. Unlike the other systems I listed, Bazel aims to support multiple languages, and to do so, it needs to integrate with other systems, not replace them. For example, if you have a Go dependency like OpenTelemetry, we need to select the same versions of its transitive dependencies that Go would select, we need to download the appropriate archives from a Go module proxy, and we need to generate BUILD
files to compile from source.
We can accomplish all that with a module extension, basically a plugin for Bazel's module system, written in Starlark.
A module extension has two parts: a set of tag classes that users can call from MODULE.bazel
files to declare metadata, and an implementation function that interprets the tags and instantiates repository rules to download and install BUILD
files for selected dependencies. Bazel gathers tags from all MODULE.bazel
files included in the build that use the extension. It then calls each extension's implementation function once, globally. The repos created by module extensions are not specific to any Bazel module.
Concretely, let's say we have a Bazel project with some Go code. The Bazel dependencies are declared in a MODULE.bazel
file, and pure Go dependencies are declared in a go.mod
file. Suppose one of our Bazel dependencies also has some Go code and has its own go.mod
file. Each Bazel module would use Gazelle's go_deps
module extension to read the go.mod
files, as below.
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")
go_deps.from_file("//:go.mod")
use_repo(
go_deps,
"io_opentelemetry_go_otel_sdk",
"com_google_cloud_go_storage",
)
The use_extension
call loads the go_deps
extension. Bazel requires this syntax instead of a load
statement because it needs to read and evaluate every MODULE.bazel
file in the module version graph before selecting the module versions to build with. Bazel doesn't actually download the source code for any module until it makes that selection, which means it can't evaluate load
statements. So we call use_extension
instead: this creates a proxy object for the module extension. Any interactions with that object are deferred until after version selection is complete.
The go_deps.from_file
call creates a tag using a tag class to register some information: a label pointing to the go.mod
file with our Go dependencies.
Finally, use_repo
makes repos created by go_deps
available to the current module. This list should include all repos directly referenced by the module's BUILD
and .bzl
files. bazel mod tidy
can add and remove repo names from this list automatically.
Toolchainization¶
We aren't going to implement all of go_deps
in rules_go_simple, our simplified example repository. But there is a smaller, more immediate problem we can solve.
In previous articles, we created a go_download
repository rule that downloaded a Go archive with a specific version, OS, and architecture, then installed a BUILD
file. We also defined a go_toolchain
rule. We used these together like this:
go_download = use_repo_rule("//:go.bzl", "go_download")
# Download distributions for macOS and Linux.
go_download(
name = "go_darwin_arm64",
goarch = "arm64",
goos = "darwin",
sha256 = "544932844156d8172f7a28f77f2ac9c15a23046698b6243f633b0a0b00c0749c",
urls = ["https://go.dev/dl/go1.25.0.darwin-arm64.tar.gz"],
)
go_download(
name = "go_linux_amd64",
goarch = "amd64",
goos = "linux",
sha256 = "2852af0cb20a13139b3448992e69b868e50ed0f8a1e5940ee1de9e19a123b613",
urls = ["https://go.dev/dl/go1.25.0.linux-amd64.tar.gz"],
)
# Register toolchains from both versions. Unfortunately, this means we'll
# actually download and extract both archives, even though we only need one.
# We'll fix that in the next version.
register_toolchains(
"@go_darwin_arm64//:toolchain",
"@go_linux_amd64//:toolchain",
)
This approach has several problems.
- Because the toolchain targets are defined within each
go_download
repo, theregister_toolchains
call forces Bazel to download both archives in order to read their constraints, even though we only need one. - If another Bazel module also calls
go_download
and registers toolchains at a different version, those toolchains are also available for toolchain resolution, and we may not get the version we expect. The toolchain resolution order subtly depends on whereregister_toolchains
was called in the module graph. - Every user needs to list all the platforms they might build for. If they want to build for a different platform, they need to add another
go_download
call and register a new toolchain.
We can solve this using a module extension that follows the Toolchainization pattern. Mike Bland coined this term and showed how rules_scala follows this pattern in Migrating to Bazel Modules (a.k.a. Bzlmod) - Toolchainization. rules_go implements this in the go_sdk
extension.
- We'll create a new
go
module extension. - It will provide a
download
tag class that lets users declare which version they want. They don't need to set the OS, architecture, URL, or SHA-256 sum. - The implementation reads all of the
download
tags in all selectedMODULE.bazel
files, then picks the highest version. - It downloads a manifest file from
https://go.dev
that lists the archive URLs and SHA-256 sums for each platform. - It then instantiates the
go_download
repository rule for each platform thatrules_go_simple
supports. - Finally, it calls the
go_toolchains
repository rule, which creates a singleBUILD
file that declares toolchains for all platforms. These declarations were previously ingo_download
.
Once this extension is implemented, rules_go_simple can declare toolchains like this:
go = use_extension("//:go.bzl", "go")
go.download("1.25.0")
use_repo(
go,
"go_toolchains",
)
register_toolchains("@go_toolchains//:all")
A user in another module can use a simpler declaration, only specifying the tag with their desired version. Other modules don't need to repeat the register_toolchains
declaration.
This solves all the problems we identified earlier. Bazel only needs to materialize the @go_toolchains
repo to select a toolchain; it only needs to download a go_download
repo when the toolchain is actually needed. The @go_toolchains
repo is shared across all selected Bazel modules, and it only includes declarations for the highest required version of Go, so we won't unexpectedly get a lower version. And @go_toolchains
includes declarations for all supported platforms, so we don't need to explicitly list what we want.
go_toolchains
repository rule¶
Let's start with the last step and implement go_toolchains
. The declaration looks like this:
go_toolchains = repository_rule(
implementation = _go_toolchains_impl,
attrs = {
"repos": attr.string_list(
doc = "List of go_download repo names, used for label generation.",
),
"goos_goarchs": attr.string_list(
doc = "goos_goarch pair (like 'linux_amd64') for each repo in repos.",
),
"_build_tpl": attr.label(
default = "//internal:BUILD.bazel.go_toolchains.tpl",
doc = "Build file template",
),
},
doc = """Internal repository rule that declares toolchain and go_toolchain
targets for all supported platforms. The go module extension calls this.""",
)
The implementation writes a BUILD
file by expanding a template string for each toolchain.
_GOOS_TO_CONSTRAINT = {
"darwin": "@platforms//os:macos",
"linux": "@platforms//os:linux",
"windows": "@platforms//os:windows",
}
_GOARCH_TO_CONSTRAINT = {
"amd64": "@platforms//cpu:x86_64",
"arm64": "@platforms//cpu:aarch64",
}
_TOOLCHAIN_BUILD_HEADER = """# Generated by go_toolchains in @rules_go_simple//internal:repo.bzl
load("@rules_go_simple//:def.bzl", "go_toolchain")
"""
_TOOLCHAIN_BUILD_TEMPLATE = """
toolchain(
name = "{toolchain_name}",
exec_compatible_with = [
{exec_constraints},
],
target_compatible_with = [
{exec_constraints},
],
toolchain = ":{toolchain_name}_impl",
toolchain_type = "@rules_go_simple//:toolchain_type",
)
go_toolchain(
name = "{toolchain_name}_impl",
builder = "{builder}",
tools = ["{tools}"],
stdlib = "{stdlib}",
)
"""
def _go_toolchains_impl(ctx):
lines = [_TOOLCHAIN_BUILD_HEADER]
for exec_idx, exec_goos_goarch in enumerate(ctx.attr.goos_goarchs):
repo_name = ctx.attr.repos[exec_idx]
exec_goos, exec_goarch = exec_goos_goarch.split("_")
exec_constraints = [
_GOOS_TO_CONSTRAINT[exec_goos],
_GOARCH_TO_CONSTRAINT[exec_goarch],
]
exec_constraints_str = ", ".join(['"{}"'.format(c) for c in exec_constraints])
builder = "@{}//:builder".format(repo_name)
tools = "@{}//:tools".format(repo_name)
stdlib = "@{}//:stdlib".format(repo_name)
lines.append(_TOOLCHAIN_BUILD_TEMPLATE.format(
toolchain_name = exec_goos_goarch,
exec_constraints = exec_constraints_str,
builder = builder,
tools = tools,
stdlib = stdlib,
))
ctx.file("BUILD.bazel", content = "\n".join(lines))
We moved the toolchain
and go_toolchain
targets out of the BUILD
file generated by go_download
, so there's not actually much new here.
go
module extension¶
Now we can implement our module extension. We start by declaring the download
tag using tag_class
.
_download_tag = tag_class(
attrs = {
"version": attr.string(),
},
doc = """
Specifies the desired version of Go to download.
The go module extension selects the highest listed version in any module.
""",
)
Next we declare go
using module_extension
. We set implementation
to our implementation function, and tag_classes
to a dictionary containing our tag. We also set os_dependent
and arch_dependent
to False
, since our extension declares the same repos and generates the same build files regardless of which platform Bazel runs on. This consolidates the data recorded in the MODULE.bazel.lock
file, ensuring that Linux, macOS, and Windows users all rely on the same metadata.
go = module_extension(
implementation = _go_impl,
tag_classes = {
"download": _download_tag,
},
os_dependent = False,
arch_dependent = False,
doc = """
Selects and downloads Go toolchain archives from go.dev and registers
appropriate Bazel toolchains. Archives are downloaded lazily, only for the
toolchains that Bazel selects at build time.
""",
)
Our implementation function accepts a module_ctx
parameter. Like the repository_ctx
parameter for repository rules, this lets you download files, access the file system, and execute commands. In addition, through the modules
field, you can access a list of bazel_module
objects, which expose metadata for the list of selected module versions. In particular, we're interested in the tags
field, which has an entry for each of our module extension's tags. We use this to find the highest version in a go.download
tag.
def _go_impl(ctx):
ctx.report_progress("selecting a version")
highest_version = None
for module in ctx.modules:
for tag in module.tags.download:
version = _parse_version(tag.version)
if version == None:
fail("module {} has download tag with invalid version '{}'".format(
module.name,
version,
))
if highest_version == None or _compare_versions(version, highest_version) > 0:
highest_version = version
if highest_version == None:
fail("go extension used without specifying a version. Declare a go.download tag with your desired version.")
go_highest_version = "go{}.{}.{}".format(highest_version.major, highest_version.minor, highest_version.patch)
After that, we fetch a list of all downloadable files from https://go.dev/dl, filtering for the version we picked.
download_index_url = "https://go.dev/dl/?mode=json&include=all"
ctx.report_progress("checking available files at {}".format(download_index_url))
ctx.download(
url = [download_index_url],
output = "versions.json",
)
data = ctx.read("versions.json")
releases = json.decode(data)
files = [
file
for release in releases
if release["version"] == go_highest_version
for file in release["files"]
if file["kind"] == "archive"
]
if len(files) == 0:
fail("selected Go version '{}' but no files found at {}".format(go_highest_version, download_index_url))
Now that we have a list of files for the version we want, we use the go_download
repository rule to create a repo for each platform that we might want to build on. Some platforms might have more than one downloadable file (for example, both .tar.gz
and .zip
). In this case, we just pick the first file.
ctx.report_progress("declaring toolchains")
download_repo_names = []
for (goos, goarch) in _PLATFORMS:
compatible_files = [
file
for file in files
if file["os"] == goos and
file["arch"] == goarch and
any([file["filename"].endswith(ext) for ext in _ALLOWED_ARCHIVE_EXTS])
]
if len(compatible_files) == 0:
fail("no files found for Go version {} compatible with {}/{}".format(go_highest_version, goos, goarch))
url = "https://go.dev/dl/{}".format(compatible_files[0]["filename"])
sha256 = compatible_files[0]["sha256"]
name = "go_{}_{}".format(goos, goarch)
download_repo_names.append(name)
go_download(
name = name,
urls = [url],
sha256 = sha256,
goos = goos,
goarch = goarch,
)
Finally, we use our go_toolchains
repository rule to declare the toolchains for all the platforms.
go_toolchains(
name = "go_toolchains",
repos = download_repo_names,
goos_goarchs = ["{}_{}".format(*platform) for platform in _PLATFORMS],
)
At the end, we return a special metadata value to say this module extension is not reproducible. An extension is reproducible if its behavior is completely determined by its tag inputs in MODULE.bazel
files and other local files. That's normally a property we'd want, but we downloaded the version manifest file from go.dev
that determines the download URLs and SHA-256 sums, and that could theoretically change at any time. Marking the extension as non-reproducible forces Bazel to record the URLs and SHA-256 sums in MODULE.bazel.lock
so that a security error is reported if something does change.
(I left out the definition of _PLATFORMS
and a few other details from the code samples above. See go_ext.bzl
to read through the whole implementation.)
Other uses for module extensions¶
In this article, we made our users' MODULE.bazel
files simpler and skipped unnecessary downloads by implementing a module extension to download and configure the Go toolchain.
There are many other use cases for module extensions. The most immediate use is to integrate other package management tools with Bazel's module system so that you can pull in dependencies from Maven, Cargo, NPM, or wherever else. The module extension need not follow the same minimal version selection rules that Bazel does; it can decide what versions to select, given tags in MODULE.bazel
or in other files like go.mod
or pom.xml
. You can implement version selection in Starlark, or you can run external commands. Afterward, the module extension can delegate downloads and build file generation to repository rules.
In general, module extensions are useful any time you need to collect information or resolve any kind of version conflict across all Bazel modules in your dependency graph. Repository rules in WORKSPACE
mode didn't have any capability like this, so module extensions are a very welcome improvement to Bazel.