Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Macros, and Variables¶
The previous two posts in this series showed how to use runfiles mechanisms
and rules_pkg
mechanisms to avoid dealing with canonical repository names
under Bzlmod. However, one special case remains: when you need to depend on the
name of a repository directory, either at build time or runtime. This post
explains how to access canonical repository names in a portable way to solve
such problems. We'll use a macro when we can, and a custom Make
Variable when we can't, including when dealing with alias
targets.
This article is part of the series "Migrating to Bazel Modules (a.k.a. Bzlmod)":
- Migrating to Bazel Modules (a.k.a. Bzlmod) - The Easy Parts
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and Runfiles
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and rules_pkg
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Macros, and Variables
Prerequisites¶
As usual with this series, it's important to be acquainted with the following concepts pertaining to external repositories:
If you wish to follow along with the examples, clone the EngFlow/example repository and change to the project directory like so:
Clone EngFlow/example and change to the example directory | |
---|---|
Examples of specific problems solved by macros or custom Make variables¶
The techiques covered throughout this post come from solutions to two specific problems in our codebase.
Custom JavaScript module loader for rules_nodejs's @npm//:node_modules
¶
The first post in this series mentioned that we're migrating our JavaScript rules from the deprecated rules_nodejs to aspect_rules_js. We also use esbuild to bundle our artifacts, and use a custom plugin to resolve our JavaScript module imports.
We'll talk about this later.
I'll cover how we got rules_nodejs
to work with Bzlmod in a later blog
post. It turns out that it can exist in the same project as
aspect_rules_js
, which is good news as far as avoiding a big bang
migration. There are other tricky aspects to completing the migration, but
thankfully, that's not one of them.
The problem is that this module loader plugin needs to scan the node_modules
directory in the @npm
repo exported by rules_nodejs
(i.e.,
@npm//:node_modules
). This plugin runs during the build, and there's no env
or other attribute we can use to pass the correct repository path through. So we
need to inject this path, including the canonical repo name for @npm
, into the
plugin's source.
The //:genrule-targets
target described in this post models our solution to
this problem. We're using a genrule and a custom Make variable to
generate a JavaScript module containing an exported constant with the correct
path.
This problem doesn't exist in aspect_rules_js
.
aspect_rules_js
creates a node_modules
directory under bazel-bin
,
avoiding this specific problem. But this approach is still good to know in
case you're facing a similar problem with another external repository.
Updating an environment variable in a cmake
build target¶
We have a vendored dependency that normally builds with cmake, which we
build using the cmake
rule from rules_foreign_cc (currently version
0.10.1). This target also depends on GNU Bison, which we use via
rules_bison.
Bison requires access to some data files at build time, at a location defined by
the BISON_PKGDATADIR
environment variable. rules_bison
normally manages this
variable, but relying on the default produces the following error for this
particular target:
bison error before setting BISON_PKGDATADIR properly | |
---|---|
The actual BISON_PKGDATADIR
files reside within the runfiles directory for the
@bison_v3.3.2//bin:bison
target created by rules_bison
. However, this
target, which users depend upon, is an alias
to a target in another repository
generated by rules_bison
that users should not depend upon. The runfiles
actually reside in that generated repository, as seen in the error output above,
not the @bison_v3.3.2
repository.
To solve this problem, we use a custom Make variable generated by the
custom repo_name_variable
Rule (described below) to create the correct
BISON_PKGDATADIR
path.
Look what I found after sending this post for review...
You can see that we prefix BISON_PKGDATADIR
with the (not well documented)
EXT_BUILD_ROOT
variable from rules_foreign_cc
. While waiting for
final approval for this post, I idly searched around in the
rules_foreign_cc
code some more for EXT_BUILD_ROOT
references. I
happened to find the private _expand_locations_in_string()
function,
used to replace "$(execpath "
in env
values with
"$$EXT_BUILD_ROOT$$/$(execpath "
. Removing $$EXT_BUILD_ROOT/
from the
env
values of our actual target did work. So, if you see EXT_BUILD_ROOT
in your cmake()
or other rules_foreign_cc
rules, you may be able to
remove it. Try it and see.
Do not hardcode canonical repo names¶
In many cases, it may seem easy and expedient to hardcode the canonical repo name. But as always, we must remember this warning from Bazel modules: Repository names and strict deps (emphasis theirs):
Note that the canonical name format is not an API you should depend on and is subject to change at any time. Instead of hard-coding the canonical name, use a supported way to get it directly from Bazel...
It's also worth remembering that the canonical name format has changed recently, and it will change again soon. We'll also see in this post examples of more complex canonical names that are even trickier to hardcode.
Look closely at the quoted documentation...
The Bazel documentation quoted above actually already uses the new, upcoming
canonical repo name format, with +
replacing the use of ~
. The new
format isn't yet the default, which is why the examples in this post still
use ~
. This inconsistency between the current default and the current
documentation underscores the very point made here to not hardcode
canonical repo names.
Runfiles libraries do not resolve directory paths¶
The macro and custom Make variable methods are necessary because, unfortunately,
we can't use runfiles libraries to help us resolve external repository
directories. We'll use our example program from the Repo Names and Runfiles
post to illustrate this. (EXAMPLE_DIR
in the output below refers to the
cloned location of the runfiles
directory.)
Even though the runfiles library successfully found the actual file, it could not find its parent directory specifically.
You might be able to hack a runfile path to get a repo name...
Of course, the runfiles paths returned from runfiles libraries follow a
pattern, as do the paths returned from rlocationpath
. It would be
possible to parse the canonical repo name from these paths in a somewhat
portable way. However, it's not really worth the effort. The other
approaches covered in this post are easier to implement and apply, while
being more future proof as well.
JavaScript currently has no compatibile runfiles libraries¶
In my Repo Names and Runfiles post, I listed runfiles libraries for several common programming languages. JavaScript was not one of them.
- rules_nodejs is a deprecated JavaScript rule set. It contains the
@bazel/runfiles
npm package, but that package hasn't been updated for Bzlmod. - aspect_rules_js is an updated, actively maintained JavaScript rule set.
However, it currently doesn't have a runfiles library, and doesn't yet
export
RUNFILES_DIR
. It does provideJS_BINARY__RUNFILES
to itsjs_binary
andjs_test
targets, but does not provide a repo mapping mechanism.
If you're using Bazel to build JavaScript, it seems doubtful you're using runfiles libraries anyway. The lack of a Bzlmod compatible library seems to indicate a lack of demand. That means hacking a path returned by a Bzlmod-aware runfiles library isn't an option to begin with.
Accessing canonical repo names via macros or custom Make variables¶
We can use either macros or custom Make variables (generated via custom Rules) to access canonical repo names in a portable way. We'll see the differences between the two approaches, and when you must use a custom rule instead of a macro.
I'll use the bzlmod/canonical-repo-name-injection
project in the
EngFlow/example repo to demonstrate and compare these approaches. The
example code in that project is directly inspired by the code we use to solve
specific problems described towards the end of this post.
Copy the repo-names.bzl file (or parts of it) into your own code base.
The repo-names.bzl
file within the project directory contains all of the
macros and rules described in this post. If you find these macros and rules
useful, you're welcome to copy this file, in whole or in part, into your own
code base. Either way, just make sure to preserve the copyright notice at
the top of the file.
Using macros¶
For many (most?) external repository issues, these very straightforward Bazel macros will do the trick.
To see these macros in action, run:
Executing //:repo-macros and examining the macro output | |
---|---|
This executes the following genrule that converts the value of the
repo_target
variable using the above macros:
Any repo name from MODULE.bazel
generated by one of the following functions is
fair game:
module()
(i.e., the main repo's name)bazel_dep()
use_repo()
- http repository rules
- local repository rules
- git repository rules
- any other repository rule imported via
use_repo_rule
Macros require only a repo name
These macros only need a valid apparent repository name from MODULE.bazel
,
not an existing BUILD
target. This is different from the custom Make
variable approach below, which requires an existing target, since Bazel will
resolve it during the analysis phase.
Here's the Frobozz Magic Remote Caching and Execution Platform Company's
MODULE.bazel
file.
Here are the results of running the same command for other values of repo_name
using the other repos from MODULE.bazel
:
Note that:
- The main repository,
@frobozz
, returns the empty string in both cases. - The assigned repo name
@rules_js
resolved to the canonical repo name of the underlying@aspect_rules_js
repo. - The
workspace_root
is always thecanonical_name
prefixed withexternal/
. However, it's probably best to useworkspace_root
where possible, as it seems more future proof than relying upon theexternal/
path prefix. @bison_v3.3.2//bin:bison
is analias
to a target in a private, generated repo. The macros produce values from the alias, not from the target to which it points, since thealias
isn't resolved during the loading phase.
Macros do not evaluate underlying alias
targets¶
The macro method may be perfect for your use case. There is one way in which it
might break down, however: If a target is an alias
to a target in another
repository. This is the case with the @bison_v3.3.2//bin:bison
target in our
example.
In this case, there's no way a Label constructed in a macro during the
loading phase can know the specified target is an alias
. This is because
an alias
is a Rule, and rules aren't executed until the analysis
phase. You need access to the actual Target provided by the alias
rule during analysis. This means you need to write a custom Rule that
depends on the alias
target.
From the documentation for attributes of type attr.label
:
At analysis time (within the rule's implementation function), when retrieving the attribute value from
ctx.attr
, labels are replaced by the corresponding Targets. This allows you to access the providers of the current target's dependencies.
Using custom Make variables¶
Unlike macros, custom Make variables are generated by custom Rules during the analysis phase. They'll work in genrules, and in any rule attribute marked as "Subject to 'Make variable' substitution".
The following rules define custom Make variables corresponding to the
canonical_name()
and workspace_name()
macros.
To see these variables in action, run:
Executing //:repo-vars and examining the custom variables output | |
---|---|
This looks exactly the same as the //:repo-macros
output so far, but the
implementation of the rule is quite different.
Running this command with different repo_target
values produces similar
output—except we must specify a fully accessible target, not just an
apparent repo name. This is because the repo_target
is evaluated during the
analysis phase; if it doesn't exist, the build will break.
//:repo-vars output | |
---|---|
Technically, all of the results above come from the //:repo-target
alias
, whose target
is set to the repo_target
variable. This confirms
that the custom Make variable rules return the repository values for the
underlying target.
For a more complex example, here's the output for the @bison_v3.3.2//bin:bison
alias target.
//:repo-vars output for the bison alias | |
---|---|
Several points to notice:
- In this case,
:repo-target
is analias
to analias
. The rules have access to the actual target at the end of thealias
chain. @bison_v3.3.2//bin:bison
is analias
to a target in a private, generated repo. As opposed to the macros, the custom Make variable rules return the correct repository information.- Hardcoding the canonical name of the repository to which the bison
alias
refers would be an extra nightmare, given the fingerprint suffix. This fingerprint can and will change based on the state of the main repository.
Injecting canonical repo names via generated source files¶
The next step is to inject these values into a program, either via environment variables, command line arguments, or generated source files.
The runfiles post covered using environment variables and command line
arguments, including using the env
and args
attributes of builtin test
and binary targets. The js_binary
rule from aspect_rules_js
provides several attributes for encoding environment variables and command line
arguments. We'll cover generating source files here.
Generating source files with runfiles paths usually isn't necessary.
Generating a source file is also an option for injecting runfiles paths for
actual files (not directories), but it's usually unnecessary in that case.
This is because runfiles libraries will interpret the first path segment as
the apparent repository name, so such paths can be safely hardcoded. In
JavaScript, this currently isn't an option. However, you can still pass
rlocationpath
values via environment variables or command line arguments
and join them to JS_BINARY__RUNFILES
(or whatever environment variable is
available).
The example project has a target to generate a JavaScript source file, which is then imported into a small example program. Running the example program as follows will produce the following output, with the following details elided:
OUTPUT_BASE
represents the result ofbazel info output_base
, e.g.,/home/mbland/.cache/bazel/_bazel_mbland/1234567890abcdef
.ARCH
represents the build architecture output path component, e.g.,k8-fastbuild
.RUNFILES_DIR
in thePWD:
path is the value of therunfiles:
path shown just above.
Here's the breakdown of what these output fields are for:
Field | Description |
---|---|
rule name | BUILD rule used to build the program |
target | value of repo_target from the BUILD file |
location | rlocationpath of :repo-target (alias of repo_target ) |
macroName | result of the canonical_repo(repo_target) macro call |
macroDir | result of the workspace_root(repo_target) macro call |
repoName | repo-name custom Make variable or rule dependency target |
repoDir | repo-dir custom Make variable or rule dependency target |
runfiles | JS_BINARY__RUNFILES env var value, set by js_binary |
PWD | working directory of the running repo-dir-check program |
binDir | $(BINDIR) Make variable or ctx.bin_dir.path |
result | actual repository directory path, including canonical name |
In the above output, we can see that:
- The program runs within the
_main
directory of its runfiles tree when run viabazel run
, hence the value ofPWD
. - The macros and the custom Make variables produce the same values, since
@pnpm
isn't analias
target. - The
repo_target
directory itself resides directly withinPWD
, sobinDir
isn't required to locate it. - The
@pnpm
repository path in this case is equivalent toPWD/macroDir
orPWD/repoDir
.
We'll return to this program shortly to explain how it's constructed, and then
run it with different repo_target
values.
Using genrule()
to generate a source file¶
The //:genrule-constants
target converts both macro and custom Make varable
values into JavaScript constants. Its output file, genrule-constants.js
, is
renamed to constants.js
and then included in the data
attribute of the
//:repo-dir-check
target. This program can also run independently of
bazel build
or bazel run
invocations.
This is a comprehensive example, but your genrule
may be simpler.
This genrule
is somewhat complex since it illustrates how to apply a
mixture of different Make variables and macros. Your own genrules need not
be so complex; take from this example only what you need.
This rule produces the following bazel-bin/genrule-constants.js
module from
the @pnpm
repo target. This file is renamed to bazel-bin/constants.js
by the
//:constants-impl
rule, which //:repo-dir-check
depends on directly. (We'll
see why this rule exists later; your own builds need not implement such an
intermediary rule.)
The repo-dir-check.mjs
example program validates the location of the
repo_target
directory, both when run via bazel run
or when run directly from
the repository root. It builds directory paths using the various constant values
and returns one that actually exists.
The next batch of example output results from executing
node bazel-bin/repo-dir-check.mjs
directly. In this output:
EXAMPLE_DIR
is where I've cloned the example repository, plus the parent of the project directory (i.e.,$HOME/example/bzlmod
).BINDIR
in thePWD:
path is the value of thebinDir:
path shown just above.
We can see from this output that when running the program directly:
PWD
is our project root, since this is where we executenode
.runfiles:
isundefined
, because the program isn't run viabazel run
.- The
@pnpm
path in this example is equivalent to eitherPWD/BINDIR/macroDir
orPWD/BINDIR/repoDir
.
Observing the difference between macros and custom Make variables¶
To show the difference between macros and custom Make variables in the context
of our genrule, we'll set the repo_target
to the @bison_v3.3.2//bin:bison
alias
target.
First we'll run it via bazel run
:
Here we see that, as opposed to running it with @pnpm
:
- The macro values and the custom Make variable values differ.
- The program finds
PWD/repoDir
, notPWD/macroDir
.
And by running the program directly via node
:
Extra credit: Using a custom Rule
to generate a source file¶
In many (most?) cases, if you have to generate a source file, writing a single bespoke genrule will be all you need to do. However, if you find yourself writing multiple such genrules, you may consider writing a reusable custom Rule to generate source files instead.
In this case, defining custom Make variables isn't necessary if the rule receives repo targets via attributes of type:
Recall that this is because attr.label
values resolve to Target instances
during the analysis phase, when the Rule executes. The same holds true for
attr.label
values included in instances of these other attributes.
I won't copy all of the code into this post, but here is what using
gen_js_constants()
from repo-names.bzl
looks like:
The intermediate //:constants-impl
rule plays a role in switching between the
genrule-based and custom rule-based constants.js
generators using a custom
command line flag. Running the following will select the
//:custom-rule-constants
implementation, which produces (almost) exactly the
same output as the //:genrule-constants
implementation.
Running the example program with //:custom-rule-constants | |
---|---|
You probably don't need a custom command line flag.
You do not need a custom command line flag in your own project. I've
included it here for ease of comparison between the genrule
and custom
Rule approaches. That, and it's a neat trick to know about—but you do
not need to use it yourself.
Conclusion¶
This post concludes our series regarding how to repair broken repository paths after enabling Bzlmod. Hopefully they provide enough information to overcome such problems in your own build.
Broken repository paths aren't the only class of Bzlmod challenges requiring
hands-on intervention, of course. Next we'll see how to replace some of our
WORKSPACE
stanzas with module extensions, particularly for dependencies that
haven't yet been adapted to handle Bzlmod. We'll learn how to make sense
of—and ultimately avoid—circular dependency errors in the process.
Credit where it's due¶
Ricard Solé developed the esbuild
loader and the original genrule
,
which he used to inject $(BINDIR)
. I piggybacked on top of that existing
genrule
to inject the canonical name of the @npm
repository from
rules_nodejs.
I got the idea for writing a custom rule to generate a source file containing
constants from Bazel's tools/python/gen_runfile_constants.bzl
.