Skip to content

Writing Bazel Rules: Module Extensions

This article originally appeared on jayconrod.com as part of the Writing Bazel Rules series.

It's been almost six years since the previous entry in this series was originally published, and there are many new topics to discuss!

The biggest change in the last few years was the introduction of Bazel modules (also known as Bzlmod) and the deprecation of WORKSPACE mode. I've updated all the previous articles to be compatible with Bazel modules, but today we'll explore newly introduced functionality: how to write a module extension, and why you'd want to do so.

What is a module extension?

Bazel's new module system is analogous to package management systems used by other build tools like Go modules, npm packages, Maven artifacts, or Cargo crates. A module author first releases a version of their module on the Bazel Central Registry (BCR). A Bazel user can then depend on that module by listing it in their MODULE.bazel file. Bazel evaluates each MODULE.bazel file when it starts, beginning with the root module containing the current directory and recursing for all dependencies. It then uses minimal version selection to pick a version of each dependency to use.

Not all dependencies are Bazel modules though, and not all package management systems are compatible with minimal version selection. Unlike the other systems I listed, Bazel aims to support multiple languages, and to do so, it needs to integrate with other systems, not replace them. For example, if you have a Go dependency like OpenTelemetry, we need to select the same versions of its transitive dependencies that Go would select, we need to download the appropriate archives from a Go module proxy, and we need to generate BUILD files to compile from source.

We can accomplish all that with a module extension, basically a plugin for Bazel's module system, written in Starlark.

A module extension has two parts: a set of tag classes that users can call from MODULE.bazel files to declare metadata, and an implementation function that interprets the tags and instantiates repository rules to download and install BUILD files for selected dependencies. Bazel gathers tags from all MODULE.bazel files included in the build that use the extension. It then calls each extension's implementation function once, globally. The repos created by module extensions are not specific to any Bazel module.

Concretely, let's say we have a Bazel project with some Go code. The Bazel dependencies are declared in a MODULE.bazel file, and pure Go dependencies are declared in a go.mod file. Suppose one of our Bazel dependencies also has some Go code and has its own go.mod file. Each Bazel module would use Gazelle's go_deps module extension to read the go.mod files, as below.

MODULE.bazel
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")

go_deps.from_file("//:go.mod")

use_repo(
    go_deps,
    "io_opentelemetry_go_otel_sdk",
    "com_google_cloud_go_storage",
)

The use_extension call loads the go_deps extension. Bazel requires this syntax instead of a load statement because it needs to read and evaluate every MODULE.bazel file in the module version graph before selecting the module versions to build with. Bazel doesn't actually download the source code for any module until it makes that selection, which means it can't evaluate load statements. So we call use_extension instead: this creates a proxy object for the module extension. Any interactions with that object are deferred until after version selection is complete.

The go_deps.from_file call creates a tag using a tag class to register some information: a label pointing to the go.mod file with our Go dependencies.

Finally, use_repo makes repos created by go_deps available to the current module. This list should include all repos directly referenced by the module's BUILD and .bzl files. bazel mod tidy can add and remove repo names from this list automatically.

Toolchainization

We aren't going to implement all of go_deps in rules_go_simple, our simplified example repository. But there is a smaller, more immediate problem we can solve.

In previous articles, we created a go_download repository rule that downloaded a Go archive with a specific version, OS, and architecture, then installed a BUILD file. We also defined a go_toolchain rule. We used these together like this:

MODULE.bazel
go_download = use_repo_rule("//:go.bzl", "go_download")

# Download distributions for macOS and Linux.
go_download(
    name = "go_darwin_arm64",
    goarch = "arm64",
    goos = "darwin",
    sha256 = "544932844156d8172f7a28f77f2ac9c15a23046698b6243f633b0a0b00c0749c",
    urls = ["https://go.dev/dl/go1.25.0.darwin-arm64.tar.gz"],
)

go_download(
    name = "go_linux_amd64",
    goarch = "amd64",
    goos = "linux",
    sha256 = "2852af0cb20a13139b3448992e69b868e50ed0f8a1e5940ee1de9e19a123b613",
    urls = ["https://go.dev/dl/go1.25.0.linux-amd64.tar.gz"],
)

# Register toolchains from both versions. Unfortunately, this means we'll
# actually download and extract both archives, even though we only need one.
# We'll fix that in the next version.
register_toolchains(
    "@go_darwin_arm64//:toolchain",
    "@go_linux_amd64//:toolchain",
)

This approach has several problems.

  1. Because the toolchain targets are defined within each go_download repo, the register_toolchains call forces Bazel to download both archives in order to read their constraints, even though we only need one.
  2. If another Bazel module also calls go_download and registers toolchains at a different version, those toolchains are also available for toolchain resolution, and we may not get the version we expect. The toolchain resolution order subtly depends on where register_toolchains was called in the module graph.
  3. Every user needs to list all the platforms they might build for. If they want to build for a different platform, they need to add another go_download call and register a new toolchain.

We can solve this using a module extension that follows the Toolchainization pattern. Mike Bland coined this term and showed how rules_scala follows this pattern in Migrating to Bazel Modules (a.k.a. Bzlmod) - Toolchainization. rules_go implements this in the go_sdk extension.

  1. We'll create a new go module extension.
  2. It will provide a download tag class that lets users declare which version they want. They don't need to set the OS, architecture, URL, or SHA-256 sum.
  3. The implementation reads all of the download tags in all selected MODULE.bazel files, then picks the highest version.
  4. It downloads a manifest file from https://go.dev that lists the archive URLs and SHA-256 sums for each platform.
  5. It then instantiates the go_download repository rule for each platform that rules_go_simple supports.
  6. Finally, it calls the go_toolchains repository rule, which creates a single BUILD file that declares toolchains for all platforms. These declarations were previously in go_download.

Once this extension is implemented, rules_go_simple can declare toolchains like this:

MODULE.bazel
go = use_extension("//:go.bzl", "go")

go.download("1.25.0")

use_repo(
    go,
    "go_toolchains",
)

register_toolchains("@go_toolchains//:all")

A user in another module can use a simpler declaration, only specifying the tag with their desired version. Other modules don't need to repeat the register_toolchains declaration.

MODULE.bazel
go = use_extension("@rules_go_simple//:go.bzl", "go")

go.download("1.25.3")

This solves all the problems we identified earlier. Bazel only needs to materialize the @go_toolchains repo to select a toolchain; it only needs to download a go_download repo when the toolchain is actually needed. The @go_toolchains repo is shared across all selected Bazel modules, and it only includes declarations for the highest required version of Go, so we won't unexpectedly get a lower version. And @go_toolchains includes declarations for all supported platforms, so we don't need to explicitly list what we want.

go_toolchains repository rule

Let's start with the last step and implement go_toolchains. The declaration looks like this:

Bazel
go_toolchains = repository_rule(
    implementation = _go_toolchains_impl,
    attrs = {
        "repos": attr.string_list(
            doc = "List of go_download repo names, used for label generation.",
        ),
        "goos_goarchs": attr.string_list(
            doc = "goos_goarch pair (like 'linux_amd64') for each repo in repos.",
        ),
        "_build_tpl": attr.label(
            default = "//internal:BUILD.bazel.go_toolchains.tpl",
            doc = "Build file template",
        ),
    },
    doc = """Internal repository rule that declares toolchain and go_toolchain
targets for all supported platforms. The go module extension calls this.""",
)

The implementation writes a BUILD file by expanding a template string for each toolchain.

Bazel
_GOOS_TO_CONSTRAINT = {
    "darwin": "@platforms//os:macos",
    "linux": "@platforms//os:linux",
    "windows": "@platforms//os:windows",
}

_GOARCH_TO_CONSTRAINT = {
    "amd64": "@platforms//cpu:x86_64",
    "arm64": "@platforms//cpu:aarch64",
}

_TOOLCHAIN_BUILD_HEADER = """# Generated by go_toolchains in @rules_go_simple//internal:repo.bzl

load("@rules_go_simple//:def.bzl", "go_toolchain")
"""

_TOOLCHAIN_BUILD_TEMPLATE = """
toolchain(
    name = "{toolchain_name}",
    exec_compatible_with = [
        {exec_constraints},
    ],
    target_compatible_with = [
        {exec_constraints},
    ],
    toolchain = ":{toolchain_name}_impl",
    toolchain_type = "@rules_go_simple//:toolchain_type",
)

go_toolchain(
    name = "{toolchain_name}_impl",
    builder = "{builder}",
    tools = ["{tools}"],
    stdlib = "{stdlib}",
)
"""

def _go_toolchains_impl(ctx):
    lines = [_TOOLCHAIN_BUILD_HEADER]
    for exec_idx, exec_goos_goarch in enumerate(ctx.attr.goos_goarchs):
        repo_name = ctx.attr.repos[exec_idx]
        exec_goos, exec_goarch = exec_goos_goarch.split("_")
        exec_constraints = [
            _GOOS_TO_CONSTRAINT[exec_goos],
            _GOARCH_TO_CONSTRAINT[exec_goarch],
        ]
        exec_constraints_str = ", ".join(['"{}"'.format(c) for c in exec_constraints])
        builder = "@{}//:builder".format(repo_name)
        tools = "@{}//:tools".format(repo_name)
        stdlib = "@{}//:stdlib".format(repo_name)
        lines.append(_TOOLCHAIN_BUILD_TEMPLATE.format(
            toolchain_name = exec_goos_goarch,
            exec_constraints = exec_constraints_str,
            builder = builder,
            tools = tools,
            stdlib = stdlib,
        ))

    ctx.file("BUILD.bazel", content = "\n".join(lines))

We moved the toolchain and go_toolchain targets out of the BUILD file generated by go_download, so there's not actually much new here.

go module extension

Now we can implement our module extension. We start by declaring the download tag using tag_class.

Bazel
_download_tag = tag_class(
    attrs = {
        "version": attr.string(),
    },
    doc = """
Specifies the desired version of Go to download.

The go module extension selects the highest listed version in any module.
""",
)

Next we declare go using module_extension. We set implementation to our implementation function, and tag_classes to a dictionary containing our tag. We also set os_dependent and arch_dependent to False, since our extension declares the same repos and generates the same build files regardless of which platform Bazel runs on. This consolidates the data recorded in the MODULE.bazel.lock file, ensuring that Linux, macOS, and Windows users all rely on the same metadata.

Bazel
go = module_extension(
    implementation = _go_impl,
    tag_classes = {
        "download": _download_tag,
    },
    os_dependent = False,
    arch_dependent = False,
    doc = """
Selects and downloads Go toolchain archives from go.dev and registers
appropriate Bazel toolchains. Archives are downloaded lazily, only for the
toolchains that Bazel selects at build time.
""",
)

Our implementation function accepts a module_ctx parameter. Like the repository_ctx parameter for repository rules, this lets you download files, access the file system, and execute commands. In addition, through the modules field, you can access a list of bazel_module objects, which expose metadata for the list of selected module versions. In particular, we're interested in the tags field, which has an entry for each of our module extension's tags. We use this to find the highest version in a go.download tag.

Bazel
def _go_impl(ctx):
    ctx.report_progress("selecting a version")
    highest_version = None
    for module in ctx.modules:
        for tag in module.tags.download:
            version = _parse_version(tag.version)
            if version == None:
                fail("module {} has download tag with invalid version '{}'".format(
                    module.name,
                    version,
                ))
            if highest_version == None or _compare_versions(version, highest_version) > 0:
                highest_version = version
    if highest_version == None:
        fail("go extension used without specifying a version. Declare a go.download tag with your desired version.")
    go_highest_version = "go{}.{}.{}".format(highest_version.major, highest_version.minor, highest_version.patch)

After that, we fetch a list of all downloadable files from https://go.dev/dl, filtering for the version we picked.

Bazel
download_index_url = "https://go.dev/dl/?mode=json&include=all"
ctx.report_progress("checking available files at {}".format(download_index_url))
ctx.download(
    url = [download_index_url],
    output = "versions.json",
)
data = ctx.read("versions.json")
releases = json.decode(data)
files = [
    file
    for release in releases
    if release["version"] == go_highest_version
    for file in release["files"]
    if file["kind"] == "archive"
]
if len(files) == 0:
    fail("selected Go version '{}' but no files found at {}".format(go_highest_version, download_index_url))

Now that we have a list of files for the version we want, we use the go_download repository rule to create a repo for each platform that we might want to build on. Some platforms might have more than one downloadable file (for example, both .tar.gz and .zip). In this case, we just pick the first file.

Bazel
ctx.report_progress("declaring toolchains")
download_repo_names = []
for (goos, goarch) in _PLATFORMS:
    compatible_files = [
        file
        for file in files
        if file["os"] == goos and
            file["arch"] == goarch and
            any([file["filename"].endswith(ext) for ext in _ALLOWED_ARCHIVE_EXTS])
    ]
    if len(compatible_files) == 0:
        fail("no files found for Go version {} compatible with {}/{}".format(go_highest_version, goos, goarch))
    url = "https://go.dev/dl/{}".format(compatible_files[0]["filename"])
    sha256 = compatible_files[0]["sha256"]

    name = "go_{}_{}".format(goos, goarch)
    download_repo_names.append(name)
    go_download(
        name = name,
        urls = [url],
        sha256 = sha256,
        goos = goos,
        goarch = goarch,
    )

Finally, we use our go_toolchains repository rule to declare the toolchains for all the platforms.

Bazel
go_toolchains(
    name = "go_toolchains",
    repos = download_repo_names,
    goos_goarchs = ["{}_{}".format(*platform) for platform in _PLATFORMS],
)

At the end, we return a special metadata value to say this module extension is not reproducible. An extension is reproducible if its behavior is completely determined by its tag inputs in MODULE.bazel files and other local files. That's normally a property we'd want, but we downloaded the version manifest file from go.dev that determines the download URLs and SHA-256 sums, and that could theoretically change at any time. Marking the extension as non-reproducible forces Bazel to record the URLs and SHA-256 sums in MODULE.bazel.lock so that a security error is reported if something does change.

Bazel
return ctx.extension_metadata(
    reproducible = False,
)

(I left out the definition of _PLATFORMS and a few other details from the code samples above. See go_ext.bzl to read through the whole implementation.)

Other uses for module extensions

In this article, we made our users' MODULE.bazel files simpler and skipped unnecessary downloads by implementing a module extension to download and configure the Go toolchain.

There are many other use cases for module extensions. The most immediate use is to integrate other package management tools with Bazel's module system so that you can pull in dependencies from Maven, Cargo, NPM, or wherever else. The module extension need not follow the same minimal version selection rules that Bazel does; it can decide what versions to select, given tags in MODULE.bazel or in other files like go.mod or pom.xml. You can implement version selection in Starlark, or you can run external commands. Afterward, the module extension can delegate downloads and build file generation to repository rules.

In general, module extensions are useful any time you need to collect information or resolve any kind of version conflict across all Bazel modules in your dependency graph. Repository rules in WORKSPACE mode didn't have any capability like this, so module extensions are a very welcome improvement to Bazel.