Migrating to Bazel Modules (a.k.a. Bzlmod) - Module Extensions¶
So far, we've covered relatively easy Bzlmod fixes, how to hold runfiles and
pkg_tar
correctly, and how to access canonical repo names when absolutely
necessary. Now we'll discuss replacing WORKSPACE
statements with your own
module extensions. You can use them to wrap your project's setup macros, and to
adapt external repositories that aren't Bzlmod compatible to work with your
Bzlmodified project.
This article is part of the series "Migrating to Bazel Modules (a.k.a. Bzlmod)":
- Migrating to Bazel Modules (a.k.a. Bzlmod) - The Easy Parts
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and Runfiles
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and rules_pkg
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Macros, and Variables
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Module Extensions
I occasionally update these blogs based on feedback, noting the changes in an Updates section at the bottom whenever I do. So don't forget to check the earlier blog posts every so often for new and improved information! (I published updates to The Easy Parts and Repo Names, Macros, and Variables at the same time as I published this post.)
Shameless self-promotion¶
By the way, if you've become a total Bzlmod nerd like me, check out my appearance on the Aspect Insights livestream:
Big thanks to Alex Eagle for having me on and allowing me to indulge in a Bzlmod geek fest! I hope it proves useful and informative to others performing Bzlmod migrations.
Prerequisites¶
As usual with this series, it's important to be acquainted with the following concepts pertaining to external repositories:
Working around dependencies instead of fixing them upstream¶
The previous posts in this series covered fixes for Bzlmod compatibility problems in your own project. In contrast, most of this post and the next two posts will show how to work around Bzlmod compatibility (or other) problems in your dependencies.
Part of the promise of Open Source Software is that you can contribute changes to projects you depend on and improve life for everybody. However, tradeoffs exist between waiting for a dependency update, attempting to contribute your own fix upstream, and making progress on your own work.
So sometimes you may choose to work around—or reach in and change—an external dependency to unblock your Bzlmod migration. Ideally you won't need to maintain your changes forever, but even if you do, Bazel makes it possible to do so indefinitely.
Despite this reasoning, and the time it takes, if you can contribute to Open Source, you should. It's always worthwhile to contribute changes upstream to Open Source projects—for you personally and professionally, for your company, and for the broader community.
I delayed writing this post for months because Jay Conrod nerd sniped me into
working on Bzlmod compatibilty for rules_scala
. Jay suggested this
because I'd migrated our EngFlow/example repo to Bzlmod by writing local
rules_scala
module extensions, patches, and a custom MODULE.bazel
configuration stanza. I accepted the challenge, and have successfully adapted
rules_scala
to use Bzlmod. (This post draws examples from my work on
EngFlow/example
, and the next two will draw from EngFlow/example
and
rules_scala
.)
We're still working on getting all the changes landed upstream, but the
rules_scala
maintainers, Simonas Pinevičius and Vaidas Pilkauskas, have
been very receptive and thoughtful. The work and the response to it have been
personally rewarding, and it will help EngFlow customers and others throughout
the Bazel community. This is how Open Source is supposed to work!
But it does take more time. So if you don't have the time to contribute right now, these next few posts will provide ideas on how to move forward. Perhaps, like me, you'll find that moving forward with your own migration will prepare you well for contributing upstream eventually.
Module extensions¶
The primary goal of Bzlmod (as I understand it) is to create a well-defined
graph of external dependencies as largely self configuring Bazel modules. This
makes Bzlmod dependencies easier to reason about and maintain over time than
WORKSPACE
dependencies. Module extensions contain Starlark code for
configuring specific Bazel modules before the build begins. Though they may
reuse code also used by WORKSPACE
, they execute under a very different model,
and have access to the module_ctx interface.
While the Module extensions page provides a fairly complete technical
description, it may help to compare the differences between module extensions
and WORKSPACE
directly. Here are some of the key differences between the two
systems arising from Bzlmod's design goals.
WORKSPACE |
MODULE.bazel |
---|---|
Bazel only evaluates the WORKSPACE file in the main repository. |
The MODULE.bazel file in the main repository defines the "root module." Bazel then evaluates the entire graph of MODULE.bazel files defined by bazel_dep() declarations reachable from the root module. |
Evaluates and executes statements in order as they're encountered. | Evaluates the module/repository dependency graph, then executes module extensions lazily, when their repos are referenced in a build. |
Allows load() to appear anywhere. |
Only allows load() to appear at the top of module extension implementation files. Calling load() in MODULE.bazel raises an error. |
Does not automatically load a repository's dependencies, configure the repository, or register its toolchains. Repositories must provide *_deps() , *_setup() , and *_toolchains() macros for users to do so explicitly. |
Automatically loads all module dependencies after resolving the module graph, ignoring dependencies marked as dev_dependency = True unless they appear in the root module's MODULE.bazel . Each module can automatically configure itself using module extensions, and can register its own toolchains. |
Extremely confusing behavior when multiple versions of a repo are declared, as the version selected depends upon the order of WORKSPACE statements. Repos can use maybe() to avoid reloading a dependency. |
Performs minimal version selection for bazel_dep() modules, and allows overrides in the root module only. Repositories instantiated within a module extension are visible only within that extension's namespace, unless explicitly exported in MODULE.bazel via use_repo() , avoiding version conflicts. |
Can import macros and constants from a .bzl file. |
Can only import module extension objects from a .bzl file via use_extension() , or repository rules via use_repo_rule() . As of Bazel 7.2.0, the include directive allows breaking MODULE.bazel statements from the root module into separate files. |
Macros can take configurable parameters. Only those specified in the main repository's WORKSPACE apply. |
Extensions can declare tag classes, which aggregate configuration values across all modules before the extension executes. The module extension itself defines its own semantics for consuming configuration information from all modules. |
Repository rules add repos to the global namespace, whether imported directly in WORKSPACE or executed within a macro. |
The root module's MODULE.bazel must bring repositories into its namespace explicitly using bazel_dep , a module extension (via use_repo() ), or a repository rule proxy (via use_repo_rule() ). Each module and module extension maintains its own repository namespace. This allows modules and extensions to import their own repository dependencies without polluting any other module's or extension's namespace. |
All toolchain dependency repos must be instantiated by the main repository's WORKSPACE file, making them visible within the global namespace. This is why packages provide *_toolchains() macros that consumers must call. |
Modules can instantiate their toolchain dependency repositories via module extensions, rendering them visible only within the namespace of the extension defining the toolchain by default. Consumers of a module need not import toolchain dependency repositories of that module in their own MODULE.bazel files, unless they want to define custom toolchains. (This is a very important detail driving the design of module extensions that encapsulate toolchain configurations.) |
Allows native.register_toolchains() calls, including in *_toolchains() macros. The main repository's WORKSPACE file must contain calls that register all necessary toolchains. |
Requires register_toolchains() in MODULE.bazel ; native.register_toolchains() in an extension raises an error. Each module's MODULE.bazel file can register its own toolchains automatically. The root module need only call register_toolchains() for a dependency's toolchains to customize toolchain resolution. |
Allows bind() and native.bind() calls. |
bind() and native.bind() raise an error. Requires removing bind() targets and updating dependents to depend upon apparent repository name labels or alias() targets instead. |
Statements can depend upon any repo introduced earlier in the file, including those implicitly created via macros. | Module extensions can only load() files from repositories introduced explicitly in MODULE.bazel . They cannot instantiate a repository and load() items from it in the same extension. Attempting to do so results in a Circular definition of repositories error. Bazel enforces this restriction across all files implementing an extension. |
The Bzlmod migration guide has lots of concrete examples showing how to
replace blocks of WORKSPACE
statements with equivalent module extensions.
Please review some of them if you haven't yet; they will make the following
examples more clear.
Defining repositories using load()
ed constants¶
This section addresses what's more likely an internal dependency issue, though it may apply to external dependencies as well.
EngFlow's original WORKSPACE
file contained a stanza of load()
statements
for importing specific versions of binary archives that we repackage for our
deployments using rules_pkg. We used a list comprehension to construct
http_archive()
calls to construct URLs from these version constants.
However, load()
statements are forbidden in MODULE.bazel
files. There's no
official documentation describing this (yet), but including one results in the
following error:
Error from calling load() in MODULE.bazel | |
---|---|
So our WORKSPACE
implementation ran afoul of two of the above Bzlmod
constraints. However, those constraints also contain the seeds of the solution.
WORKSPACE |
MODULE.bazel |
---|---|
Allows load() to appear anywhere. |
Only allows load() to appear at the top of module extension implementation files. Calling load() in MODULE.bazel raises an error. |
Can import macros and constants from a .bzl file. |
Can only import module extension objects from a .bzl file via use_extension() , or repository rules via use_repo_rule() . As of Bazel 7.2.0, the include directive allows breaking MODULE.bazel statements from the root module into separate files. |
The solution was to create a module extension file, which can load()
the
constants and invoke http_archive()
to create each repo.
Then we import the repos created by the extension in the MODULE.bazel
file.
The root_module_direct_deps = "all"
attribute of
module_ctx.extension_metadata will cause an error if MODULE.bazel
doesn't import all of them.
Using repos from the module extension in MODULE.bazel | |
---|---|
Avoiding circular repository definitions¶
rules_scala, which is not Bzlmod compatible—yet!—posed
a challenge while migrating EngFlow/example to Bzlmod. Several people had
started trying to make rules_scala
Bzlmod compatible, but efforts had been
halting due to the complexity of the task.
- bazelbuild/bazel-central-registry: wanted: bazelbuild/rules_scala #522
- bazelbuild/rules_scala: Support Bzlmod and add rules_scala to bazel-central-registry #1482
Despite this fact, I was able to get rules_scala
to work with our repo by
writing custom module extensions and a couple of small patches. I'll discuss the
role of patches in the next post in the Bzlmod series.
As mentioned earlier, Bzlmod is very strict about not defining and using
repositories in the same file. This produced the first challenge to overcome
with rules_scala
.
WORKSPACE |
MODULE.bazel |
---|---|
Statements can depend upon any repo introduced earlier in the file, including those implicitly created via macros. | Module extensions can only load() files from repositories introduced explicitly in MODULE.bazel . They cannot instantiate a repository and load() items from it in the same extension. Attempting to do so results in a Circular definition of repositories error. Bazel enforces this restriction across all files implementing an extension. |
Bazel modules can have circular dependencies
While repositories cannot have circular definitions, and BUILD
targets
can't have circular dependencies, Bazel modules can have circular
dependencies. One prevalent example of such a circular dependency is between
rules_go and gazelle. I didn't realize this until Fabian
Meumertzheim corrected my misunderstanding in a #bzlmod thread in the Bazel
Slack workspace.
This is our original rules_scala
configuration in WORKSPACE
(modulo some
formatting tweaks), as seen in EngFlow/example: Migrate rules_scala to bzlmod,
delete WORKSPACE #317.
The problem is that importing rules_scala
into your project is a multistep
process to define the generation of many repositories required by the rule set.
(Specifically, the many repositories required by the toolchains provided by
the rule set; more about this below.)
scala_config()
generates the@io_bazel_rules_scala_config
repo, which contains a generated config file.- The files implementing
scala_repositories()
andscalatest_repositories()
callload()
on@io_bazel_rules_scala_config//:config.bzl
to determine which repositories they create.
native.register_toolchains() was a minor problem, too.
Both scala_register_toolchains()
and scalatest_toolchain()
call
native.register_toolchains()
. In this case, the solution proved very easy;
we replaced those calls in our WORKSPACE
file with a single
register_toolchains()
call. The next blog post in the Bzlmod
series on applying patches will describe other
native.register_toolchains()
workarounds.
First, I tried to apply scala_config()
in MODULE.bazel
by importing it via
use_repo_rule()
.
Attempt at calling scala_config() from MODULE.bazel | |
---|---|
That broke in an interesting way. scala_config()
is a macro wrapping a
repository rule, but is not itself a repository rule. I'm not entirely sure why
it breaks this way; the code path leading to
BzlmodRepoCycleReporter.maybeReportCycle()
requires further study. But
this was the error message:
So right off the bat, we have to call scala_config()
from a module extension.
As it turns out, we have to call it in a separate extension from
scala_repositories()
and scalatest_repositories()
, lest we encounter another
circular repository definition error.
In the following error output, we're calling both scala_config()
and
scala_repositories()
from our custom //scala:deps.bzl
module extension file.
As the dependency chain above implies, the repositories.bzl
file, part of the
scala_repositories()
implementation, contains a load()
statement depending
on @io_bazel_rules_scala_config//:config.bzl
. Hence the scala_config()
and scala_repositories()
calls must reside separate extensions.
Local rules_scala
module extensions, Mark I¶
These are the original module extensions I developed to solve this problem in
EngFlow/example: Migrate rules_scala to bzlmod, delete WORKSPACE #317.
These correspond to rules_scala
version 6.4.0, relying on its default Scala
version.
Check out modules.as_extension from bazel-skylib
I only learned about modules.as_extension
from bazel_skylib
after watching Fabian Meumertzheim's Aspect Insights interview after
applying the technique described here. Definitely check out that utility and
use it if you can. However, if you'd prefer to roll your own module
extension, consider the following a gentle introduction to the craft.
Here's the resulting MODULE.bazel
stanza. Since rules_scala
doesn't
yet—yet!—have a module extension configuring its toolchain
repositories, we import them in our own MODULE.bazel
file, and call
register_toolchains()
ourselves. These toolchain repos correspond to Scala
2.12.18, the default version configured by rules_scala
as of v6.4.0.
Local rules_scala
module extensions, Mark II¶
I updated EngFlow/example to use rules_scala
v6.6.0, which updated
toolchain artifact repository naming to include the Scala version. I
updated MODULE.bazel
to accommodate this naming scheme, and moved the
config.bzl
and deps.bzl
files to //scala/extensions
while I was at it.
In the next pull request, I updated the scala_config
module extension to
allow the selection of Scala versions. It's an example of a basic extension
that gathers information from its tag classes across all modules, giving
precedence to the main repository's root_settings
. (The scala_deps
implementation remained the same.)
The MODULE.bazel
stanza now instantiates toolchain repositories specific to
the SCALA_VERSION
specified in the scala_config.settings
tag. The list
comprehension at the end performs this operation, and also registers any Scala
version-specific toolchains.
Checking your work¶
You can perform initial validation of your module extensions by running bazel
mod deps
. This will force evaluation of the entire Bazel module dependency
graph, and report any errors it finds. (You may not want to check in the
resulting MODULE.bazel.lock
, however.) Then you can run bazel build
and
bazel test
as desired to ensure everything works as intended.
A glimpse of the future¶
Once the Bzlmodification of rules_scala
is complete, the MODULE.bazel
stanza above will look something more like this:
This is because rules_scala
will automatically configure its toolchains in its
own module extension, and register them all in its own MODULE.bazel
file. You
can see discussion around this interface in bazelbuild/rules_scala:
Toolchainize //scala:toolchain_type #1633.
Try my rules_scala working branches in the meanwhile.
I've committed to keeping the bzlmod and bzlmod-bazel-8 branches
of my rules_scala
fork in working condition until all their changes land
upstream. You're welcome to use either to prototype your own Bzlmod
migration as appropriate; see my 2025-01-07 comment on
rules_scala#1482 for compatibility details for each branch. See also
how Yun Peng from the Bazel Open Source team used git_override
with
the bzlmod
branch in bazelbuild/rules_webtesting#478 (as mentioned
in bazelbuild/rules_scala#1652).
Conclusion¶
Writing my own MODULE.bazel
files, module extensions, and patches for external
dependencies has helped me complete EngFlow's Bzlmod migration. I'm hopeful that
sharing these insights and techniques will help others make progress on their
own migrations without waiting for all their dependencies to migrate.
The original plan was to describe all these techniques in one post, but it was too much information. The next two posts will cover:
- Bzlmod incompatibility problems you must fix in your own project or patch a dependency to resolve (if an upstream fix isn't forthcoming).
- More examples of repo name dependencies and how to resolve them. Technically, these are also patchable problems, but there's enough of them that they warrant yet another dedicated post.
I've mostly prioritized getting our Bzlmod migration done and publishing about
the process first, but I'm also now trying to contribute changes upstream. I'm
already contributing to rules_scala
, and I'm planning to submit pull requests
to other projects based on the changes described in this series. It takes a lot
more time, but helps a lot more people, and is rewarding in ways that transcend
working only on our own repo. Even so, these contributions wouldn't've been
possible without having gone through this process, as I learned a lot and had
working code in hand.
As always, I'm open to questions, suggestions, corrections, and updates relating to this series of Bzlmodification posts. It's easiest to find me lurking in the #bzlmod channel of the Bazel Slack workspace. I'd love to hear how your own Bzlmod migration is going—especially if these blog posts or Open Source contributions have helped!