Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Again…¶
The apparent and canonical repository name schema under Bzlmod is the gift that keeps on giving. Much of what it has to give is quite good—once you learn how to really hold it right. Which is to say, to avoid holding canonical repo names at all. That's what the three previous "Repo Names..." posts in this series were all about.
Those previous posts, however, pertained to using BUILD
rules, or when
accessing runfiles. In those situations, solutions exist to avoid handling
canonical repo names directly as a consumer.
If you maintain a Bazel rule set, or need to fix a rule set upon which your project depends, this is the post for you. We'll see how improper repo name usages sneak into rule implementations, and how to shoo them out. Examples include removing canonical repo names from embedded resource paths, filtering lists of target labels, and generating default repository target names. We also discuss removing internal references to your project's own apparent repository name to avoid minor yet preventable issues.
This article is part of the series "Migrating to Bazel Modules (a.k.a. Bzlmod)":
- Migrating to Bazel Modules (a.k.a. Bzlmod) - The Easy Parts
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and Runfiles
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and rules_pkg
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Macros, and Variables
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Module Extensions
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Fixing and Patching Breakages
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Again…
I occasionally update these blogs based on feedback, noting the changes in the Updates section at the bottom whenever I do. So don't forget to check the earlier blog posts every so often for new and improved information!
Prerequisites¶
As always, please acquaint yourself with the following concepts pertaining to external repositories if you've yet to do so:
Prime Directive¶
Only refer to a repository by its apparent name, not by its canonical name. The purpose of the canonical name is to allow Bazel to store repositories as a flat list of directories with unique names. This is an implementation detail, and the name format is subject to change at any time.
This applies as much to rules producers as it does consumers. Deviating from the Prime Directive and directly manipulating canonical repo names in any way is a recipe for future pain. This is especially so if you maintain a rule set upon which others beyond your own team depend.
Subprime Directive¶
Do not include your project's own apparent repository name in internal target
labels. This goes for load()
statements as well as all other internal
usages.
As we'll see, this isn't a critical problem, and relatively straightforward workarounds exist; hence why it's a "Subprime" Directive. However, removing a repository's own apparent name from internal target labels is super easy, and makes your code smaller and more future proof.
Concrete examples from rules_scala
¶
This post draws examples from the following pull requests, from my work on
making rules_scala
Bzlmod compatible (bazelbuild/rules_scala#1482):
-
bazelbuild/rules_scala#1696: Remove @io_bazel_rules_scala or replace with Label removed internal references to the project's own apparent repo name.
-
bazelbuild/rules_scala#1621: Update repo name handling for Bzlmod compatibility introduced three fixes for canoncial repo name related breakages.
-
bazelbuild/rules_scala#1650: Replace apparent_repo_name with repo rule wrappers superseded one of those fixes with a better solution.
-
bazelbuild/rules_scala#1694: Use rctx.original_name if available provided a superior solution to repo rule wrapper macros.
repository_ctx.original_name
first appeared in Bazel 8.1.0.
These examples still fall under the category of patchable solutions, which I covered in the previous post. Still, they warrant their own post, given how pervasive the class of repo name dependency problems happens to be. Hopefully these examples may inspire changes in your own project and/or patches for your dependencies.
Removing internal references to the repo's own name¶
Hardcoded references to a projects's own apparent repository name might've been
required under WORKSPACE
at one time, but haven't been for a while. Today,
these references are never necessary, and only add friction when using or
maintaining a repository or module. Without these hardcoded references, users
can choose any name for a repository or module, and maintainers can change its
standard name without breaking anything.
To avoid any potential issues entirely, without breaking existing users at all, bazelbuild/rules_scala#1696 made the following changes:
-
It replaced all
load("@io_bazel_rules_scala//...", ...)
statements withload("//...", ...)
. -
Everywhere else, it replaced
"@io_bazel_rules_scala//..."
target strings withLabel("//...")
orstr(Label("//..."))
. (Many, but not all of theseLabel
wrappers were overkill, as noted in When not to wrap target strings in a Label below.)
There's no downside for existing users, but plenty of upside for all
This change has no immediate impact on existing users, other than allowing
WORKSPACE
users to use a different repo name if and when they so choose.
All users' projects will continue to work without changing a thing on their
end. But they'll appreciate the freedom, and you'll no longer be committed
to supporting a specific repo name as a maintainer.
Also, while this produced a somewhat large pull request, it was a purely mechanical change that was easy to produce and review. It also didn't have to happen all at once—in fact, I'd been making similar changes in earlier pull requests when convenient. This is to say that the fix is really easy and low risk, though it may require a bit of grunt work.
Why internal apparent repository name references can cause (minor) trouble¶
When migrating to Bzlmod, it's good to maintain WORKSPACE
compatibility and
reduce friction for WORKSPACE
users to the extent possible. This makes it
easier for users to adopt your improved implementation sooner than later, and to
eventually migrate to Bzlmod when it's convenient for them. Even when
introducing breaking changes to your WORKSPACE
API to accommodate a Bzlmod
implementation, maintaining WORKSPACE
compatibility gets them one step closer
to Bzlmod adoption.
With that in mind, internal repository name references introduce unnecessary
friction for WORKSPACE
users. These references force users to import your
repository using the same name
with http_archive or other rules. This is
because WORKSPACE
loads all repositories within a globally shared namespace.
Alternatively, users could supply a repo_mapping
argument to http_archive
or
other rules to translate internal repo name references to another repo name.
For example, rules_scala
has always been the official name, but the
recommended standard repo name was io_bazel_rules_scala
. The motivation for
the longer name, as with many other rule sets following common code packaging
conventions, was to prevent namespace collisions with other implementations.
Importing the repository as rules_scala
would've required injecting a
repo_mapping
into any other repository using the longer name—including
rules_scala
itself:
Example repo_mapping for http_archive | |
---|---|
repo_mapping isn't available within module extensions.
repo_mapping
arguments are not supported within module extensions; only
setting the name
to match the internal references would work in that
situation.
Under Bzlmod, these longer names are no longer necessary, as Bzlmod itself
prevents namespace collisions by design. At the same time, internal repo name
references won't affect Bzlmod users, but they may require declaring a
repo_name
in your module
declaration. Here's what it would've looked
like within rules_scala
:
rules_scala Bazel module import with hardcoded name | |
---|---|
Finally, changing the apparent repository name would require updating all
internal instances. On top of that, WORKSPACE
users would need to update their
name
or repo_mapping
values to fix the resulting build breakage.
Though these are relatively minor issues, they are issues all the same that we can prevent entirely by eliminating internal apparent repository name references altogether.
When to wrap target strings in a Label
¶
Bazel evaluates build target strings in the context of the repository containing
the assignment of the string to a Label
attribute, argument, or constructor.
When in doubt, it never hurts to wrap target string literals within .bzl
files
in a Label
, for internal and external targets alike.
Always wrap target string literals within macro implementations, repository rules that inject them into generated files, and calls to functions from other packages. Otherwise, the build may break when Bazel evaluates the target string in the context of another repository.
When expanding a macro, Bazel interprets targets strings as relative to the
repository containing the BUILD
file invoking the macro. This is usually the
right thing to do, but it can break target string literals that should act as
constants within the macro implementation.
By the same token, target strings injected by a repository rule into a BUILD
or .bzl
file may break unless properly encoded using Label
. The same is true
of target strings passed as arguments to functions from an external repository.
Without applying a Label
wrapper first, target strings beginning with //
will appear to belong to the generated or external repo. Strings containing the
originating repository's apparent repo name may work, but will perpetuate the
problem of unnecessary apparent repo name references.
For example, rules_scala calls toolchains.use_toolchain from rules_proto with
its own protocol compiler toolchain_type. PROTOC_TOOLCHAIN_TYPE
is
already wrapped in a Label
:
Passing a rules_scala target to use_toolchain | |
---|---|
This is the rules_proto 7.1.0 implementation of the toolchains.use_toolchain
function (implemented as the private _use_toolchain
function):
rules_proto function using a target label | |
---|---|
Removing the Label
wrapper from PROTOC_TOOLCHAIN_TYPE
causes the following
build error. This is because evaluating the target string in the context of
toolchains.use_toolchain
yields a target that doesn't exist in rules_proto
:
In contrast, the Label
constructor always evaluates its target string in the
context of the repository containing the .bzl
file in which it appears.
Wrapping internal target strings in a Label
ensures they are treated as
constant values within the context of their own repository. Wrapping string
literals representing targets in other repositories can be beneficial as well,
to ensure the correct repo mapping applies. See the Label constructor and
Label resolution in (legacy) macros documentation for more details.
Handling macro arguments as Label objects
If you need to handle a target string argument as a Label
within a macro
implementation, don't wrap it in a Label
. This will convert it to a target
in the repo containing the macro definition, not the repo containing the
macro call. Instead, when using Bazel 8, consider using symbolic
macros. They automatically convert target strings applied to their
Label
arguments in the context of the package (and repository) invoking
the macro. Or, when defining legacy macros (the only kind available
before Bazel 8), use native.package_relative_label to achieve the same
effect.
When not to wrap target strings in a Label
¶
Wrapping target strings in a Label
isn't necessary if they aren't evaluated
within macros, injected into generated repositories, or passed to functions from
other repositories. In fact, subsequent experimentation confirmed that many of
the Label
instances added in bazelbuild/rules_scala#1696 weren't
technically required.
Within your repository's .bzl
files, target string literals suffice in the
following contexts (provided they aren't direct arguments to external function
calls in the process):
-
rule() expressions, which Bazel evaluates during the loading phase (during load() evaluation, before
BUILD
macro expansion), even when wrapped in a publicly exported function -
Expressions initializing file level objects containing attr.label and related attribute types during
.bzl
file initialization (also during load() evaluation, including, but limited to, rule()) -
Expressions within rule implementation functions, such as ctx.toolchains, which Bazel evaluates during the analysis phase (after
BUILD
macro expansion) -
Arguments to native.register_toolchains() statements invoked by
WORKSPACE
macros
Wrapping target strings with Label
still works, as evidenced by the
rules_scala
tests continuing to pass. Using Label
everywhere out of an
abundance of caution isn't a bad idea, but is ultimately a matter of taste, not
a requirement.
Removing external/$REPO_NAME
from embedded resource paths¶
bazelbuild/rules_scala#1621 removed external/$REPO_NAME
from embedded
resource paths. Before making this change, turning on Bzlmod produced errors
like the following (formatted for readability):
This was due to code trying to access resources using paths like:
/external/test_new_local_repo/resource.txt
The fix updated the _target_path_by_default_prefixes
macro to trim the
external/$REPO_NAME
prefix from dependency paths, matching existing
processing for paths beginning with resources/
or java/
.
(string.split()
could've also worked here, but
string.partition()
fit the pattern of the surrounding code.)
For example, passing <output_base>/external/<repo_name>/resource.txt
to
string.partition()
would yield:
dir_1
:<output_base>/
dir_2
:external/
rel_path
:<repo_name>/resource.txt
Consequently, rel_path[rel_path.index("/"):]
yields /resource.txt
. This
required updating code accessing such resources like so:
A breaking change from the previous rules_scala release
This was technically a breaking change, but as rules_scala
maintainer
Simonas Pinevičius pointed out, the original behavior mistakenly
leaked build details into the code.
Avoiding canonical repo name handling when filtering targets¶
bazelbuild/rules_scala#1621 updated a filtering mechanism for including and
excluding targets using the dependency_tracking_strict_deps_patterns
attribute
of scala_toolchain()
:
The problem was that the underlying _phase_dependency
, which several scala_*
rule implementations use, compared these plain string patterns against
stringified ctx.label
values. ctx.label
, provided to a Rule
implementation function during the analysis phase, will contain a canonical
repository name under Bzlmod. Hence, none of the filters containining apparent
repo names would ever match the intended Label
values.
My first attempt at a solution involved using macros to transform repo names in two places:
- In the
_partition_patterns
helper of thescala_toolchain
rule - In the
_phase_dependency
function, which is invoked much later
Fabian Meumertzheim's line of questioning and suggestion-ing led me to
abandon this approach in favor of wrapping the original scala_toolchain
rule
using a macro. This involved:
- Renaming the
scala_toolchain
rule to_scala_toolchain
- Adding a new
scala_toolchain
macro to wrap the call to the_scala_toolchain
rule - Adding the
_expand_patterns
macro to expand every target pattern usingnative.package_relative_label
I then restored _phase_dependency
to its original state. This preserved the
original API; works under Bazel 6, 7 and 8; and works under WORKSPACE
and
Bzlmod.
This still allows partial package prefix matches.
This implementation doesn't completely address the concern Fabian raised
about matching plain string prefixes against Label
values. It will
still allow partial matching of package prefixes, e.g., "//foo/bar"
will
match "//foo/bar:baz"
and "//foo/barquux:xyzzy"
. It could be updated
easily to disallow such matches, but that would be a breaking change from
existing API behavior. Ending the filters with :
would work around this,
e.g., "//foo/bar:"
would match "//foo/bar:baz"
, but not
"//foo/barquux:xyzzy"
.
Wrapping repository_rule
s to generate default target names¶
The final case we'll cover is generating default target names for
repository_rule
s. This is based on the fact that Label
parses
the default top level target from a repo name:
rules_scala
depends upon this behavior for its Maven artifact repository
generation schema, which allows targets to depend on the @artifact_repo_name
label instead of @artifact_repo_name//:artifact_repo_name
.
However, under Bzlmod, repository_ctx.name
is the canonical repo name, not
the value provided as the name
parameter of the repository_rule
invocation.
This breaks existing repository_rule
implementations that use
repository_ctx.name
to generate the default target name in the resulting repo.
At the time, the officially sanctioned workaround was to add another
attr.string
to the repository_rule
, and assign it the same value as
name
. Passing an unmangled name as a separate attribute sidesteps the
dependency on canonical repo names completely. The Bazel modules: Repository
names and strict deps documentation states (emphasis theirs):
Note that the canonical name format is not an API you should depend on and is subject to change at any time. Instead of hard-coding the canonical name, use a supported way to get it directly from Bazel...
Using a different default target name is a possibility.
Providing a different default target name, either hardcoding it or providing
it as an attr.string
default value, would be another approach. However,
updating all existing target labels in that case might prove prohibitively
time consuming.
I decided to introduce macro wrappers around the original repository rules to
duplicate the name
attribute. The steps to do so were:
- Like with the
scala_toolchain
macro above, renaming the originalrepository_rule
definition to start with an underscore. - Adding a new
attr.string
for the duplicatename
attribute, if one wasn't already available. - Using
dict.pop()
anddict.get()
to preserve the originalrepository_rule
API while duplicating thename
parameter if required.
Here's an example from bazelbuild/rules_scala #1650: Replace apparent_repo_name with repo rule wrappers:
As noted in the pull request, there are trade offs with this approach:
-
The macro can't be imported into a
MODULE.bazel
file directly viause_repo_rule
. You have to write a module extension to call it, or usemodules.as_extension
frombazel_skylib
. Either solution enables remapping the apparent repo name viause_repo
, provided the macro or custom extension provides a defaultname
. -
The documentation from the original
repository_rule
doesn't automatically convey to the repo wrapper. For internalrepository_rule
s, like those in bazelbuild/rules_scala#1650, this may not be much of a concern, but it's worth being aware of.
Using repository_ctx.original_name
in Bazel 8.1.0 and later¶
While the macro wrapper works, compelling users of a repository_rule
to
duplicate name
information is a bit clumsy and potentially error prone.
Further discussion of the default repo target name use case ensued in the Bazel
Slack workspace, and the Bazel maintainers decided to support it. I filed
bazelbuild/bazel#24467 to follow up.
Shortly thereafter, the Bazel maintainers added
repository_ctx.original_name
in Bazel 8.1.0.
I then filed bazelbuild/rules_scala#1694: Use rctx.original_name if
available to make use of it. The still macro-wrapped implementation
contains lines such as (where rctx
is a repository_ctx
object):
Bazel 8.1.0 contains a small original_name bug under WORKSPACE.
I discovered immediately after the Bazel 8.1.0 release that repository_ctx.original_name was the empty string under WORKSPACE. Fabian Meumertzheim fixed this in bazelbuild/bazel#25296, released in Bazel 8.1.1.
Criminal activity: Parsing apparent repo names from canonical repo names¶
This final technique, parsing an apparent repository name from a canonical
repository name, is for historical consideration only. It was a temporary
measure I employed when first Bzlmodifying rules_scala
, which I've since
undone using the techniques above and throughout this blog series.
Though it seems to do the right thing, and enabled me to move on and solve many other Bzlmodification problems, consider this a last resort. This is why I'm covering it last, after all.
Here's the function, originally from bazelbuild/bazel-skylib#548: Add
apparent_repo_name utility to modules.bzl. It mirrors Bazel's repo
name parsing to identify the apparent repo name component of the canonical
repository_ctx.name
value, and is backwards compatible with WORKSPACE
builds.
In a comment on bazelbuild/bazel#24467, I noted a more succinct implementation that I'd discovered:
Both of these examples are less brittle than depending on exact canonical repository names, or the exact canonical repository name format. If you're in a pinch, either technique can help you make progress until you find a better solution to a specific problem. By identifying every location in your code that depends on it, it can help you find everywhere you need to apply a better solution later.
It does technically depend on the canonical repo name format to some degree. By definition, that makes it potentially brittle in the face of future possible changes to the format. It's still best to avoid parsing the canonical repo name if you can help it.
Even so, I feel no guilt for my past transgressions. By applying this solution
at the time, I accepted a limited, calculated risk to remove a significant
source of friction from the Bzlmod migration process. I was able to solve many
more problems before replacing applications of the above function with superior
solutions. In fact, moving on to gain more experience helped me solicit,
understand, and accept advice to adopt better techniques. This experience also
helped me contribute to the case for adding first class support to Bazel for
repository_ctx.original_name
.
Moral #1
The first moral of this story is not to let the perfect be the enemy of the good—but don't give up the quest. This function, while meeting with disapproval from the official Bazel maintainers, enabled me to make progress until such time that a better way became apparent. However, when the better way did become apparent, I replaced it as soon as I could.
Moral #2
The second moral of the story is that good tests are critical to making
forward progress and making improvements over time. The rules_scala
test
suite has been a tremendous security blanket, catching many problems
throughout the Bzlmodification effort. I couldn't've applied either this
initial solution or the improved one with such confidence without this test
suite.
And now for something completely different...(but still on theme)¶
In the Bazel Slack workspace, I also got pulled into a new canonical repo name
related challenge while using rules_python. The challenge involved
identifying PyPI dependencies and extracting their package names to generate a
py_wheel(requires=...)
list. The previous solution parsed the PyPI
names from canonical repo names embedded in py_library
input file paths.
mbland/rules_python_pip_name contains several experiments resulting from the discussion. The two most promising are:
-
//:lists
: Thepip_and_srcs_lists
rule makes use of aPypiInfo
provider fromrules_python_pypi_info.patch
. The patch parsespypi_*
tags from apy_library
target to create the provider. -
//:requirements_file
: Therequirements
rule usesctx.actions.run_shell
to extract names from*.dist-info/METADATA
files with a shell pipeline. This output would be suitable for therequires_file
parameter, as opposed torequires
.
This latter solution looks like:
Conclusion¶
As mentioned in the two previous posts, writing my own module extensions and patches for external dependencies helped me complete EngFlow's Bzlmod migration. Contributing to the Bzlmodification of rules_scala has yielded even more important insights and techniques, to the benefit of the broader Bazel ecosystem. Hopefully sharing these ideas helps others make progress on their own migrations, and those of their users, without waiting for all their dependencies to migrate.
Just as hopefully, perhaps foolishly so, I won't have to write about resolving repository name issues again. Other topics I'm considering for the next Bzlmod related posts include:
-
"Toolchainization", or packaging toolchain dependencies in a rule set behind a convenient Bzlmod API
As always, I'm open to questions, suggestions, corrections, and updates relating to this series of Bzlmodification posts. It's easiest to find me lurking in the #bzlmod channel of the Bazel Slack workspace. I'd love to hear how your own Bzlmod migration is going—especially if these blog posts have helped!