Skip to content

Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Again…

The apparent and canonical repository name schema under Bzlmod is the gift that keeps on giving. Much of what it has to give is quite good—once you learn how to really hold it right. Which is to say, to avoid holding canonical repo names at all. That's what the three previous "Repo Names..." posts in this series were all about.

Those previous posts, however, pertained to using BUILD rules, or when accessing runfiles. In those situations, solutions exist to avoid handling canonical repo names directly as a consumer.

If you maintain a Bazel rule set, or need to fix a rule set upon which your project depends, this is the post for you. We'll see how improper repo name usages sneak into rule implementations, and how to shoo them out. Examples include removing canonical repo names from embedded resource paths, filtering lists of target labels, and generating default repository target names. We also discuss removing internal references to your project's own apparent repository name to avoid minor yet preventable issues.

This article is part of the series "Migrating to Bazel Modules (a.k.a. Bzlmod)":

I occasionally update these blogs based on feedback, noting the changes in the Updates section at the bottom whenever I do. So don't forget to check the earlier blog posts every so often for new and improved information!

Prerequisites

As always, please acquaint yourself with the following concepts pertaining to external repositories if you've yet to do so:

Prime Directive

Only refer to a repository by its apparent name, not by its canonical name. The purpose of the canonical name is to allow Bazel to store repositories as a flat list of directories with unique names. This is an implementation detail, and the name format is subject to change at any time.

This applies as much to rules producers as it does consumers. Deviating from the Prime Directive and directly manipulating canonical repo names in any way is a recipe for future pain. This is especially so if you maintain a rule set upon which others beyond your own team depend.

Subprime Directive

Do not include your project's own apparent repository name in internal target labels. This goes for load() statements as well as all other internal usages.

As we'll see, this isn't a critical problem, and relatively straightforward workarounds exist; hence why it's a "Subprime" Directive. However, removing a repository's own apparent name from internal target labels is super easy, and makes your code smaller and more future proof.

Concrete examples from rules_scala

This post draws examples from the following pull requests, from my work on making rules_scala Bzlmod compatible (bazelbuild/rules_scala#1482):

These examples still fall under the category of patchable solutions, which I covered in the previous post. Still, they warrant their own post, given how pervasive the class of repo name dependency problems happens to be. Hopefully these examples may inspire changes in your own project and/or patches for your dependencies.

Removing internal references to the repo's own name

Hardcoded references to a projects's own apparent repository name might've been required under WORKSPACE at one time, but haven't been for a while. Today, these references are never necessary, and only add friction when using or maintaining a repository or module. Without these hardcoded references, users can choose any name for a repository or module, and maintainers can change its standard name without breaking anything.

To avoid any potential issues entirely, without breaking existing users at all, bazelbuild/rules_scala#1696 made the following changes:

  • It replaced all load("@io_bazel_rules_scala//...", ...) statements with load("//...", ...).

  • Everywhere else, it replaced "@io_bazel_rules_scala//..." target strings with Label("//...") or str(Label("//...")). (Many, but not all of these Label wrappers were overkill, as noted in When not to wrap target strings in a Label below.)

There's no downside for existing users, but plenty of upside for all

This change has no immediate impact on existing users, other than allowing WORKSPACE users to use a different repo name if and when they so choose. All users' projects will continue to work without changing a thing on their end. But they'll appreciate the freedom, and you'll no longer be committed to supporting a specific repo name as a maintainer.

Also, while this produced a somewhat large pull request, it was a purely mechanical change that was easy to produce and review. It also didn't have to happen all at once—in fact, I'd been making similar changes in earlier pull requests when convenient. This is to say that the fix is really easy and low risk, though it may require a bit of grunt work.

Why internal apparent repository name references can cause (minor) trouble

When migrating to Bzlmod, it's good to maintain WORKSPACE compatibility and reduce friction for WORKSPACE users to the extent possible. This makes it easier for users to adopt your improved implementation sooner than later, and to eventually migrate to Bzlmod when it's convenient for them. Even when introducing breaking changes to your WORKSPACE API to accommodate a Bzlmod implementation, maintaining WORKSPACE compatibility gets them one step closer to Bzlmod adoption.

With that in mind, internal repository name references introduce unnecessary friction for WORKSPACE users. These references force users to import your repository using the same name with http_archive or other rules. This is because WORKSPACE loads all repositories within a globally shared namespace.

Alternatively, users could supply a repo_mapping argument to http_archive or other rules to translate internal repo name references to another repo name. For example, rules_scala has always been the official name, but the recommended standard repo name was io_bazel_rules_scala. The motivation for the longer name, as with many other rule sets following common code packaging conventions, was to prevent namespace collisions with other implementations.

Importing the repository as rules_scala would've required injecting a repo_mapping into any other repository using the longer name—including rules_scala itself:

Example repo_mapping for http_archive
1
2
3
repo_mapping = {
  "@io_bazel_rules_scala": "@rules_scala",
},

repo_mapping isn't available within module extensions.

repo_mapping arguments are not supported within module extensions; only setting the name to match the internal references would work in that situation.

Under Bzlmod, these longer names are no longer necessary, as Bzlmod itself prevents namespace collisions by design. At the same time, internal repo name references won't affect Bzlmod users, but they may require declaring a repo_name in your module declaration. Here's what it would've looked like within rules_scala:

rules_scala Bazel module import with hardcoded name
1
2
3
4
5
6
7
8
module(
    name = "rules_scala",
    version = "7.0.0",
    bazel_compatibility = [">=7.1.0"],
    compatibility_level = 7,
    # NOTE: This repo_name argument WILL NOT appear in the official repository.
    repo_name = "io_bazel_rules_scala",
)

Finally, changing the apparent repository name would require updating all internal instances. On top of that, WORKSPACE users would need to update their name or repo_mapping values to fix the resulting build breakage.

Though these are relatively minor issues, they are issues all the same that we can prevent entirely by eliminating internal apparent repository name references altogether.

When to wrap target strings in a Label

Bazel evaluates build target strings in the context of the repository containing the assignment of the string to a Label attribute, argument, or constructor. When in doubt, it never hurts to wrap target string literals within .bzl files in a Label, for internal and external targets alike.

Always wrap target string literals within macro implementations, repository rules that inject them into generated files, and calls to functions from other packages. Otherwise, the build may break when Bazel evaluates the target string in the context of another repository.

When expanding a macro, Bazel interprets targets strings as relative to the repository containing the BUILD file invoking the macro. This is usually the right thing to do, but it can break target string literals that should act as constants within the macro implementation.

By the same token, target strings injected by a repository rule into a BUILD or .bzl file may break unless properly encoded using Label. The same is true of target strings passed as arguments to functions from an external repository. Without applying a Label wrapper first, target strings beginning with // will appear to belong to the generated or external repo. Strings containing the originating repository's apparent repo name may work, but will perpetuate the problem of unnecessary apparent repo name references.

For example, rules_scala calls toolchains.use_toolchain from rules_proto with its own protocol compiler toolchain_type. PROTOC_TOOLCHAIN_TYPE is already wrapped in a Label:

Passing a rules_scala target to use_toolchain
1
2
3
4
PROTOC_TOOLCHAIN_TYPE = Label("//protoc:toolchain_type")

# ...snip PROTOC_ATTR definition...
PROTOC_TOOLCHAINS = toolchains.use_toolchain(PROTOC_TOOLCHAIN_TYPE)

This is the rules_proto 7.1.0 implementation of the toolchains.use_toolchain function (implemented as the private _use_toolchain function):

rules_proto function using a target label
1
2
3
4
5
def _use_toolchain(toolchain_type):
    if _incompatible_toolchains_enabled():
        return [config_common.toolchain_type(toolchain_type, mandatory = False)]
    else:
        return []

Removing the Label wrapper from PROTOC_TOOLCHAIN_TYPE causes the following build error. This is because evaluating the target string in the context of toolchains.use_toolchain yields a target that doesn't exist in rules_proto:

Build error when PROTOC_TOOLCHAIN_TYPE is a string
$ bazel build //test/...

# [ ...snip... ]

ERROR: no such package '@@rules_proto+//protoc':
    BUILD file not found in directory 'protoc' of external repository
    @@rules_proto+. Add a BUILD file to a directory to mark it as a package.

ERROR: .../test/proto/custom_generator/BUILD.bazel:63:22:
    While resolving toolchains for target
    //test/proto/custom_generator:scala_proto_toolchain_def (096dcc8):
        com.google.devtools.build.lib.packages.BuildFileNotFoundException:
    no such package '@@rules_proto+//protoc':
    BUILD file not found in directory 'protoc' of external repository
    @@rules_proto+. Add a BUILD file to a directory to mark it as a package.

ERROR: Analysis of target
    '//test/proto/custom_generator:scala_proto_toolchain_def' failed;
    build aborted

In contrast, the Label constructor always evaluates its target string in the context of the repository containing the .bzl file in which it appears. Wrapping internal target strings in a Label ensures they are treated as constant values within the context of their own repository. Wrapping string literals representing targets in other repositories can be beneficial as well, to ensure the correct repo mapping applies. See the Label constructor and Label resolution in (legacy) macros documentation for more details.

Handling macro arguments as Label objects

If you need to handle a target string argument as a Label within a macro implementation, don't wrap it in a Label. This will convert it to a target in the repo containing the macro definition, not the repo containing the macro call. Instead, when using Bazel 8, consider using symbolic macros. They automatically convert target strings applied to their Label arguments in the context of the package (and repository) invoking the macro. Or, when defining legacy macros (the only kind available before Bazel 8), use native.package_relative_label to achieve the same effect.

When not to wrap target strings in a Label

Wrapping target strings in a Label isn't necessary if they aren't evaluated within macros, injected into generated repositories, or passed to functions from other repositories. In fact, subsequent experimentation confirmed that many of the Label instances added in bazelbuild/rules_scala#1696 weren't technically required.

Within your repository's .bzl files, target string literals suffice in the following contexts (provided they aren't direct arguments to external function calls in the process):

  • rule() expressions, which Bazel evaluates during the loading phase (during load() evaluation, before BUILD macro expansion), even when wrapped in a publicly exported function

  • Expressions initializing file level objects containing attr.label and related attribute types during .bzl file initialization (also during load() evaluation, including, but limited to, rule())

  • Expressions within rule implementation functions, such as ctx.toolchains, which Bazel evaluates during the analysis phase (after BUILD macro expansion)

  • Arguments to native.register_toolchains() statements invoked by WORKSPACE macros

Wrapping target strings with Label still works, as evidenced by the rules_scala tests continuing to pass. Using Label everywhere out of an abundance of caution isn't a bad idea, but is ultimately a matter of taste, not a requirement.

Removing external/$REPO_NAME from embedded resource paths

bazelbuild/rules_scala#1621 removed external/$REPO_NAME from embedded resource paths. Before making this change, turning on Bzlmod produced errors like the following (formatted for readability):

Error due to a broken embedded resource path
$ bazel test //test/src/main/scala/scalarules/test/resources:all

1) Scala library depending on resources from external resource-only
  jar::allow to load resources(scalarules.test.resources
    .ScalaLibResourcesFromExternalDepTest)
  java.lang.NullPointerException
    at scalarules.test.resources.ScalaLibResourcesFromExternalDepTest
      .get(ScalaLibResourcesFromExternalDepTest.scala:17)
    at scalarules.test.resources.ScalaLibResourcesFromExternalDepTest
      .$anonfun$new$3(ScalaLibResourcesFromExternalDepTest.scala:11)
    at scalarules.test.resources.ScalaLibResourcesFromExternalDepTest
      .$anonfun$new$2(ScalaLibResourcesFromExternalDepTest.scala:11)

This was due to code trying to access resources using paths like:

  • /external/test_new_local_repo/resource.txt

The fix updated the _target_path_by_default_prefixes macro to trim the external/$REPO_NAME prefix from dependency paths, matching existing processing for paths beginning with resources/ or java/. (string.split() could've also worked here, but string.partition() fit the pattern of the surrounding code.)

From scala/private/resources.bzl
1
2
3
4
5
6
    # Looking inside an external repository. Trim off both the "external/" and
    # the repository name components. Especially important under Bzlmod, because
    # the canonical repository name may change between versions.
    (dir_1, dir_2, rel_path) = path.partition("external/")
    if rel_path:
        return rel_path[rel_path.index("/"):]

For example, passing <output_base>/external/<repo_name>/resource.txt to string.partition() would yield:

  • dir_1: <output_base>/
  • dir_2: external/
  • rel_path: <repo_name>/resource.txt

Consequently, rel_path[rel_path.index("/"):] yields /resource.txt. This required updating code accessing such resources like so:

Diff of code accessing resources in external repos
diff --git a/test/src/main/scala/scalarules/test/resources/ScalaLibResourcesFromExternalScalaTest.scala b/test/src/main/scala/scalarules/test/resources/ScalaLibR
esourcesFromExternalScalaTest.scala
index 99a9e8e7..599ca7ed 100644
--- a/test/src/main/scala/scalarules/test/resources/ScalaLibResourcesFromExternalScalaTest.scala
+++ b/test/src/main/scala/scalarules/test/resources/ScalaLibResourcesFromExternalScalaTest.scala
@@ -6,7 +6,7 @@ class ScalaLibResourcesFromExternalScalaTest extends AnyFunSuite {

   test("Scala library depending on resources from external resource-only jar should allow to load resources") {
     val expectedString = String.format("A resource%n"); //Using platform dependent newline (%n)
-    assert(get("/external/test_new_local_repo/resource.txt") === expectedString)
+    assert(get("/resource.txt") === expectedString)
   }

   private def get(s: String): String =

A breaking change from the previous rules_scala release

This was technically a breaking change, but as rules_scala maintainer Simonas Pinevičius pointed out, the original behavior mistakenly leaked build details into the code.

Avoiding canonical repo name handling when filtering targets

bazelbuild/rules_scala#1621 updated a filtering mechanism for including and excluding targets using the dependency_tracking_strict_deps_patterns attribute of scala_toolchain():

Target implementing an include/exclude target filter
# From //test_expect_failure/missing_direct_deps/filtering:BUILD
scala_toolchain(
    name = "plus_one_strict_deps_filter_a_impl",
    dependency_mode = "plus-one",
    dependency_tracking_method = "ast",
    dependency_tracking_strict_deps_patterns = [
        "@//test_expect_failure/missing_direct_deps/filtering",
        "-@//test_expect_failure/missing_direct_deps/filtering:a",
    ],
    strict_deps_mode = "error",
    visibility = ["//visibility:public"],
)

The problem was that the underlying _phase_dependency, which several scala_* rule implementations use, compared these plain string patterns against stringified ctx.label values. ctx.label, provided to a Rule implementation function during the analysis phase, will contain a canonical repository name under Bzlmod. Hence, none of the filters containining apparent repo names would ever match the intended Label values.

My first attempt at a solution involved using macros to transform repo names in two places:

  • In the _partition_patterns helper of the scala_toolchain rule
  • In the _phase_dependency function, which is invoked much later

Fabian Meumertzheim's line of questioning and suggestion-ing led me to abandon this approach in favor of wrapping the original scala_toolchain rule using a macro. This involved:

  1. Renaming the scala_toolchain rule to _scala_toolchain
  2. Adding a new scala_toolchain macro to wrap the call to the _scala_toolchain rule
  3. Adding the _expand_patterns macro to expand every target pattern using native.package_relative_label
From scala/scala_toolchain.bzl
def _expand_patterns(patterns):
    """Expands string patterns to match actual Label values."""
    result = []

    for p in patterns:
        exclude = p.startswith("-")
        p = p.lstrip("-")
        expanded = str(native.package_relative_label(p)) if p else ""

        # If the original pattern doesn't contain ":", match any target
        # beginning with the pattern prefix.
        if expanded and ":" not in p:
            expanded = expanded[:expanded.rindex(":")]

        result.append(("-" if exclude else "") + expanded)

    return result

def scala_toolchain(**kwargs):
    """Creates a Scala toolchain target."""
    strict = kwargs.pop("dependency_tracking_strict_deps_patterns", [""])
    unused = kwargs.pop("dependency_tracking_unused_deps_patterns", [""])
    _scala_toolchain(
        dependency_tracking_strict_deps_patterns = _expand_patterns(strict),
        dependency_tracking_unused_deps_patterns = _expand_patterns(unused),
        **kwargs
    )

I then restored _phase_dependency to its original state. This preserved the original API; works under Bazel 6, 7 and 8; and works under WORKSPACE and Bzlmod.

This still allows partial package prefix matches.

This implementation doesn't completely address the concern Fabian raised about matching plain string prefixes against Label values. It will still allow partial matching of package prefixes, e.g., "//foo/bar" will match "//foo/bar:baz" and "//foo/barquux:xyzzy". It could be updated easily to disallow such matches, but that would be a breaking change from existing API behavior. Ending the filters with : would work around this, e.g., "//foo/bar:" would match "//foo/bar:baz", but not "//foo/barquux:xyzzy".

Wrapping repository_rules to generate default target names

The final case we'll cover is generating default target names for repository_rules. This is based on the fact that Label parses the default top level target from a repo name:

Default Label target names in the Bazel source
/**
  * Parses a raw label string into parts. The logic can be summarized by the
  * following table:
  *
  * <pre>{@code
  *  raw                  | repo   | repoIs-   | ...snip... | target
  *                       |        | Canonical |            |
  * ----------------------+--------+-----------+------------+-----------
  * ...snip...            |        |           |            |
  * "@repo"               | "repo" | false     |            | "repo"
  * "@@repo"              | "repo" | true      |            | "repo"

rules_scala depends upon this behavior for its Maven artifact repository generation schema, which allows targets to depend on the @artifact_repo_name label instead of @artifact_repo_name//:artifact_repo_name.

However, under Bzlmod, repository_ctx.name is the canonical repo name, not the value provided as the name parameter of the repository_rule invocation. This breaks existing repository_rule implementations that use repository_ctx.name to generate the default target name in the resulting repo.

At the time, the officially sanctioned workaround was to add another attr.string to the repository_rule, and assign it the same value as name. Passing an unmangled name as a separate attribute sidesteps the dependency on canonical repo names completely. The Bazel modules: Repository names and strict deps documentation states (emphasis theirs):

Note that the canonical name format is not an API you should depend on and is subject to change at any time. Instead of hard-coding the canonical name, use a supported way to get it directly from Bazel...

Using a different default target name is a possibility.

Providing a different default target name, either hardcoding it or providing it as an attr.string default value, would be another approach. However, updating all existing target labels in that case might prove prohibitively time consuming.

I decided to introduce macro wrappers around the original repository rules to duplicate the name attribute. The steps to do so were:

  1. Like with the scala_toolchain macro above, renaming the original repository_rule definition to start with an underscore.
  2. Adding a new attr.string for the duplicate name attribute, if one wasn't already available.
  3. Using dict.pop() and dict.get() to preserve the original repository_rule API while duplicating the name parameter if required.

Here's an example from bazelbuild/rules_scala #1650: Replace apparent_repo_name with repo rule wrappers:

Wrapping a repository_rule in a macro to duplicate the name
1
2
3
4
5
6
7
def jvm_import_external(**kwargs):
    """Wraps `_jvm_import_external` to pass `name` as `generated_target_name`.

    If `generated_rule_name` is specified already, this is a noop.
    """
    generated_rule_name = kwargs.pop("generated_rule_name", kwargs.get("name"))
    _jvm_import_external(generated_rule_name = generated_rule_name, **kwargs)

As noted in the pull request, there are trade offs with this approach:

  • The macro can't be imported into a MODULE.bazel file directly via use_repo_rule. You have to write a module extension to call it, or use modules.as_extension from bazel_skylib. Either solution enables remapping the apparent repo name via use_repo, provided the macro or custom extension provides a default name.

  • The documentation from the original repository_rule doesn't automatically convey to the repo wrapper. For internal repository_rules, like those in bazelbuild/rules_scala#1650, this may not be much of a concern, but it's worth being aware of.

Using repository_ctx.original_name in Bazel 8.1.0 and later

While the macro wrapper works, compelling users of a repository_rule to duplicate name information is a bit clumsy and potentially error prone. Further discussion of the default repo target name use case ensued in the Bazel Slack workspace, and the Bazel maintainers decided to support it. I filed bazelbuild/bazel#24467 to follow up.

Shortly thereafter, the Bazel maintainers added repository_ctx.original_name in Bazel 8.1.0. I then filed bazelbuild/rules_scala#1694: Use rctx.original_name if available to make use of it. The still macro-wrapped implementation contains lines such as (where rctx is a repository_ctx object):

repository_ctx.original_name usage in rules_scala
def _alias_repository_impl(rctx):
    # ...snip...

    # Replace with rctx.original_name once all supported Bazels have it
    "name": getattr(rctx, "original_name", rctx.attr.default_target_name),

# ...snip...
_alias_repository = repository_rule(
    implementation = _alias_repository_impl,
    attrs = {
        # Remove once all supported Bazels have repository_ctx.original_name
        "default_target_name": attr.string(mandatory = True),
        "target": attr.string(mandatory = True),
    },
)

# Remove this macro and use `_alias_repository` directly once all supported
# Bazel versions support `repository_ctx.original_name`.
def _alias_repository_wrapper(**kwargs):
    # ...snip...

Bazel 8.1.0 contains a small original_name bug under WORKSPACE.

I discovered immediately after the Bazel 8.1.0 release that repository_ctx.original_name was the empty string under WORKSPACE. Fabian Meumertzheim fixed this in bazelbuild/bazel#25296, released in Bazel 8.1.1.

Criminal activity: Parsing apparent repo names from canonical repo names

This final technique, parsing an apparent repository name from a canonical repository name, is for historical consideration only. It was a temporary measure I employed when first Bzlmodifying rules_scala, which I've since undone using the techniques above and throughout this blog series.

Though it seems to do the right thing, and enabled me to move on and solve many other Bzlmodification problems, consider this a last resort. This is why I'm covering it last, after all.

Here's the function, originally from bazelbuild/bazel-skylib#548: Add apparent_repo_name utility to modules.bzl. It mirrors Bazel's repo name parsing to identify the apparent repo name component of the canonical repository_ctx.name value, and is backwards compatible with WORKSPACE builds.

E-ville function to parse a canonical repo name
def _apparent_repo_name(repository_ctx):
    """Generates a repository's apparent name from a repository_ctx object.

    Useful when generating the default top level `BUILD` target for the
    repository.

    Example:
    ```starlark
    _ALIAS_TARGET_TEMPLATE = \"\"\"alias(
        name = "{name}",
        actual = "@{target_repo_name}",
        visibility = ["//visibility:public"],
    )
    \"\"\"

    def _alias_repository_impl(repository_ctx):
        repository_ctx.file("BUILD", _ALIAS_TARGET_TEMPLATE.format(
            name = apparent_repo_name(rctx),
            target = rctx.attr.target_repo_name,
        ))
    ```

    Args:
        repository_ctx: a repository_ctx object

    Returns:
        An apparent repo name derived from repository_ctx.name
    """
    repo_name = repository_ctx.name

    # Bazed on this pattern from the Bazel source:
    # com.google.devtools.build.lib.cmdline.RepositoryName.VALID_REPO_NAME
    for i in range(len(repo_name) - 1, -1, -1):
        c = repo_name[i]
        if not (c.isalnum() or c in "_-."):
            return repo_name[i + 1:]

    return repo_name

In a comment on bazelbuild/bazel#24467, I noted a more succinct implementation that I'd discovered:

E-ville canonical repo name separator usage
1
2
3
4
5
# https://gitlab.arm.com/bazel/toolchain_utils/-/blob/main/toolchain/separator.bzl#L3
SEPARATOR = Label("@local").workspace_name.removesuffix("local")[-1]

# https://gitlab.arm.com/bazel/toolchain_utils/-/blob/main/toolchain/resolved/repository.bzl#L37
target = rctx.attr.target or rctx.attr.name.rsplit(SEPARATOR, 1)[1]

Both of these examples are less brittle than depending on exact canonical repository names, or the exact canonical repository name format. If you're in a pinch, either technique can help you make progress until you find a better solution to a specific problem. By identifying every location in your code that depends on it, it can help you find everywhere you need to apply a better solution later.

It does technically depend on the canonical repo name format to some degree. By definition, that makes it potentially brittle in the face of future possible changes to the format. It's still best to avoid parsing the canonical repo name if you can help it.

Even so, I feel no guilt for my past transgressions. By applying this solution at the time, I accepted a limited, calculated risk to remove a significant source of friction from the Bzlmod migration process. I was able to solve many more problems before replacing applications of the above function with superior solutions. In fact, moving on to gain more experience helped me solicit, understand, and accept advice to adopt better techniques. This experience also helped me contribute to the case for adding first class support to Bazel for repository_ctx.original_name.

Moral #1

The first moral of this story is not to let the perfect be the enemy of the good—but don't give up the quest. This function, while meeting with disapproval from the official Bazel maintainers, enabled me to make progress until such time that a better way became apparent. However, when the better way did become apparent, I replaced it as soon as I could.

Moral #2

The second moral of the story is that good tests are critical to making forward progress and making improvements over time. The rules_scala test suite has been a tremendous security blanket, catching many problems throughout the Bzlmodification effort. I couldn't've applied either this initial solution or the improved one with such confidence without this test suite.

And now for something completely different...(but still on theme)

In the Bazel Slack workspace, I also got pulled into a new canonical repo name related challenge while using rules_python. The challenge involved identifying PyPI dependencies and extracting their package names to generate a py_wheel(requires=...) list. The previous solution parsed the PyPI names from canonical repo names embedded in py_library input file paths.

mbland/rules_python_pip_name contains several experiments resulting from the discussion. The two most promising are:

  • //:lists: The pip_and_srcs_lists rule makes use of a PypiInfo provider from rules_python_pypi_info.patch. The patch parses pypi_* tags from a py_library target to create the provider.

  • //:requirements_file: The requirements rule uses ctx.actions.run_shell to extract names from *.dist-info/METADATA files with a shell pipeline. This output would be suitable for the requires_file parameter, as opposed to requires.

This latter solution looks like:

Example rule to parse PyPI info from *.dist-info/METADATA files
def _collect_runfiles(deps):
    return depset(transitive = [
        dep[DefaultInfo].default_runfiles.files
        for dep in deps
    ])

def _requirements_impl(ctx):
    files = [
        f
        for f in _collect_runfiles(ctx.attr.deps).to_list()
        if f.basename == "METADATA" and f.dirname.endswith(".dist-info")
    ]
    outfile = ctx.outputs.requirements_file

    ctx.actions.run_shell(
        inputs = files,
        outputs = [outfile],
        command = (
            "grep \"^Name: \" \"%s\" | " % (
                "\" \"".join([f.path for f in files])
            ) +
            "cut -d\" \" -f2 | " +
            "tr \"[:upper:]\" \"[:lower:]\" | " +
            "sort >> " + outfile.path
        ),
    )

requirements = rule(
    implementation = _requirements_impl,
    attrs = {
        "deps": attr.label_list(),
        "requirements_file": attr.output(mandatory = True),
    }
)

Conclusion

As mentioned in the two previous posts, writing my own module extensions and patches for external dependencies helped me complete EngFlow's Bzlmod migration. Contributing to the Bzlmodification of rules_scala has yielded even more important insights and techniques, to the benefit of the broader Bazel ecosystem. Hopefully sharing these ideas helps others make progress on their own migrations, and those of their users, without waiting for all their dependencies to migrate.

Just as hopefully, perhaps foolishly so, I won't have to write about resolving repository name issues again. Other topics I'm considering for the next Bzlmod related posts include:

As always, I'm open to questions, suggestions, corrections, and updates relating to this series of Bzlmodification posts. It's easiest to find me lurking in the #bzlmod channel of the Bazel Slack workspace. I'd love to hear how your own Bzlmod migration is going—especially if these blog posts have helped!