Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and Runfiles¶
The first post in our Bzlmod migration series explained many of the problems that may arise when migrating your project. These next three posts will explore various solutions to problems arising from changes in how Bazel handles repository names under Bzlmod, beginning with runfiles paths. After applying the techniques in this post, your project should be well insulated from runfiles path related breakages, now and well into the future.
This article is part of the series "Migrating to Bazel Modules (a.k.a. Bzlmod)":
- Migrating to Bazel Modules (a.k.a. Bzlmod) - The Easy Parts
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and Runfiles
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and rules_pkg
- Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Macros, and Variables
Prerequisites¶
The following advice assumes that you're familiar with the runfiles concept, as well as the concepts presented in the following Bazel documents:
- External Dependencies Overview
- Canonical repository name
- Apparent repository name
- Bazel Modules
- Output Directory Layout
It also assumes that you've correctly declared all dependency targets in the
deps
and/or data
attributes of your BUILD
rules.
Translating file paths using runfiles libraries¶
If you enable Bzlmod and your runfiles paths are immediately broken, you need to start using runfiles libraries.
Runfiles libraries help you translate runfile paths corresponding to Bazel
labels to actual file system paths, so your programs can find these files at
runtime. For example, from the rlocationpaths example
below, in which the main repository's module name from MODULE.bazel
is
engflow
:
- Bazel label:
//data:0-foo.txt
- Runfile path:
engflow/data/0-foo.txt
- Runfiles link:
/home/mbland/.cache/bazel/_bazel_mbland/1234567890abcdef/execroot/_main/bazel-out/k8-fastbuild/bin/runfiles_demo.runfiles/_main/data/0-foo.txt
- Actual path:
/home/mbland/src/EngFlow/example/runfiles/engflow/data/0-foo.txt
(I'll explain the patterns behind the runfile path and actual path below.)
Runfiles libraries are distributed as source code for different languages, enabling tests or other programs to locate their runfiles programatically during execution. They originally came about to make Bazel runfiles portable to Windows (developed by my colleague László Csomor prior to the existence of EngFlow). Many of them now map runfiles paths to actual file system paths under Bzlmod (via the repo mapping mechanism), making them essential to using Bzlmod.
In this way, runfiles libraries prevent your code from breaking whenever there's a change in Bazel's internal repository name representation. From Bazel modules: Repository names and strict deps (emphasis theirs):
...the canonical name format is not an API you should depend on and is subject to change at any time. Instead of hard-coding the canonical name, use a supported way to get it directly from Bazel...
The canonical name format just changed again.
The change to no longer encode module versions in canonical repo names in Bazel
7.1.0 is a recent
example of Bazel maintainers altering the format. The Bazel maintainers
just changed the canonical repo name format again to fix build performance
issues on Windows caused by the ~
characters. This will
land in Bazel 7.3.0 under the
‑‑incompatible_use_plus_in_repo_names flag, which implies other
canonical name format changes as well. See also: bazelbuild/bazel:
‑‑incompatible_use_plus_in_repo_names #23127.
Runfiles libraries for different languages¶
Here's where you can find the runfiles libraries for a few common languages. The links and notes below describe how to depend on and initialize these libraries; sections below describe how to use them.
Note that the links here are to files within the latest versions at the time of writing. Make sure that you're viewing the versions matching your project's actual dependencies.
C++¶
- Target:
@bazel_tools//tools/cpp/runfiles
- Documentation: runfiles_src.h header comment
Requires initialization using the BAZEL_CURRENT_REPOSITORY
macro symbol.
Java¶
- Target:
@bazel_tools//tools/java/runfiles
- Documentation: Runfiles.java class comment
Requires using the @AutoBazelRepository
annotation to generate a class
constant used during initialization.
Bash¶
- Target:
@bazel_tools//tools/bash/runfiles
- Documentation: runfiles.bash header comment
Requires copying a preamble from the header comment to enable discovery at runtime.
Python¶
- Target:
@rules_python//python/runfiles
- Documentation: rules_python/python/runfiles/README.md; the runfiles.py source file
Do not use the @bazel_tools
Python runfiles library, as it is out of
date and does not support Bzlmod.
Go¶
- Target:
@rules_go//go/runfiles:go_default_library
- Documentation: runfiles.go (the whole file)
Rust¶
- Target:
"@rules_rust//tools/runfiles"
- Documentation: runfiles.rs header comment
Using predefined source/output path variables to pass paths to rlocation()
¶
All runfiles libraries, once properly initialized, provide a standard
rlocation()
or Rlocation()
function. For example, here is the Rlocation()
function signature and docstring from rules_python:
The intended usage is to use the rlocationpath
and rlocationpaths
predefined source/output path variables in your BUILD
targets to
generate paths to pass to rlocation()
:
rlocationpath
: The path a built binary can pass to theRlocation
function of a runfiles library to find a dependency at runtime.......[the result] always starts with the name of the repository....
The
rlocationpath
of a file in an external repository repo will start withrepo/
, followed by the repository-relative path.Passing this path to a binary and resolving it to a file system path using the runfiles libraries is the preferred approach to find dependencies at runtime.
Pass the rlocationpath
results to your program as command line arguments or
environment variables by:
- Specifying them in the
args
orenv
attribute of test or binary targets - Including them in the
cmd
attribute of a genrule
Generating compiled modules
It's also possible to generate text files including these paths, or source
files for different languages defining constants from these rlocationpath
values. I've done it—but ultimately found it to be unnecessary. See
the Passing known file path constants to
rlocation()
section
below.
rlocationpaths
example¶
To illustrate, we'll use a small example project containing a py_binary
that
emits information about its runfiles, which are provided by rlocationpaths
.
If you'd like to follow along with the example on your own machine, clone the EngFlow/example repo by running:
Clone the EngFlow/example repo | |
---|---|
Let's examine some of the files from this repo before running the demonstration program.
First, we define our engflow
module in engflow/MODULE.bazel
. It depends on
the frobozz
module from the same git repository, using local_path_override to simulate an
external repository (lines 4 and 5).
Now let's look at the example program itself. Notice that it:
- Instantiates the
_RUNFILES
object using therules_python
runfiles library (lines 23 and 26) - Prints information about runfiles related environment variables, the runfiles directory, and the current working directory (lines 50-62)
- Iterates over runfiles paths from both command line arguments and the
RUNFILE_PATHS
environment variable (lines 64-72) - Prints information about individual runfiles, their runfiles directory links, and their actual file system paths (lines 36-43)
The engflow/BUILD
file defines a py_binary
for our demo program, with env
and args
attributes that will be applied during bazel run
. The runfiles
targets are specified in its data
attribute.
It also contains a genrule
that uses this binary and passes runfile paths as
environment variables and command line arguments in its cmd
attribute. For
genrule
s, the runfile targets are specified in the srcs
attribute.
Runfiles input attributes may vary.
Most rules have a data
attribute to specify runfiles, but genrule
doesn't; it uses srcs
instead. Other rules may or may not also populate
the runfiles directory with srcs
. Check the documentation for the rule in
question to learn what's included in its runfiles. When in doubt, you can
modify this example project, or write your own from scratch, to get insight
into how specific rules manage their runfiles.
Let's build the package and see the resulting outputs.
We can see the generated runfiles directory (runfiles_demo.runfiles
), as well
as the .repo_mapping
(for Bzlmod) and .runfiles_manifest
support files. The
runfiles libraries use these artifacts to translate relative runfile paths to
their actual paths within the execution environment.
No runfiles directory by default on Windows
If you're running this program on Windows, you likely won't see the
runfiles_demo.runfiles
directory. See the Enabling runfiles directories
on Windows section below for
details on how to enable runfiles directory generation.
Let's run the //:runfiles_demo
Python binary. It receives one space separated
list of rlocationpaths
paths from its command line args
, and another passed
in via the RUNFILES_PATHS
environment variable.
In the example output below, I've elided some output as follows to collapse details specific to my local build:
OUTPUT_BASE
represents the result ofbazel info output_base
, e.g.,/home/mbland/.cache/bazel/_bazel_mbland/1234567890abcdef
.ARCH
represents the build architecture output path component, e.g.,k8-fastbuild
.RUNFILES_DIR
in therunfiles link:
paths is the value ofrunfiles dir:
at the top of the output.EXAMPLE_DIR
is where I've cloned the example repository, plus therunfiles
directory containing the example, e.g.,/home/mbland/src/EngFlow/example/runfiles
.
This should make the output easier to understand, and help you see the same patterns in the output when running the example locally.
There are few interesting things to notice here:
- The paths generated by
rlocationpaths
already include the canonical name of thefrobozz
external repository, which isfrobozz~
. (For now, that is; it will appear asfrobozz+
in the near future.) - The actual locations for runfiles in external repositories are direct paths
into the external repository storage directory under
OUTPUT_BASE
. - Runfile paths for files in our own repository begin with
_main
. This is because we're building ourengflow
module as our main repository. bazel run
runs therunfiles_demo
binary inside the_main
subdirectory of itsrunfiles dir
.RUNFILES_MANIFEST_FILE
is defined instead ofRUNFILES_DIR
, so the runfiles library is using the manifest file to translate paths.- The actual
runfiles dir
comes from stripping_manifest
fromRUNFILES_MANIFEST_FILE
, and we can see that corresponding runfiles links do exist for each file.
Now let's run //:runfiles_demo
as part of the genrule
target and inspect the output. SANDBOX
stands in for the sandbox path components.
Notice that:
RUNFILES_DIR
was defined in the environment;RUNFILES_MANIFEST_FILE
was not.- The rule did not run the
//:runfiles_demo
binary in theRUNFILES_DIR
. - Since the
//:runfiles_demo
binary executed duringbazel build
instead ofbazel run
, bothrlocationpath
outputs resolved to a sandbox path, not the typical output path. - The
args
andenv
attributes of//:runfiles_demo
weren't used, since thegenrule
executed the//:runfiles_demo
binary directly, not viabazel run
.
Passing known file path constants to rlocation()
¶
While using rlocationpath
is the preferred way to pass runfiles paths
through to the runfiles libraries, it's not strictly necessary. It can also
prove inconvenient in some cases, such as:
- Writing tests that refer to one or more input data files
- Providing a library that executes a binary on behalf of a larger program
In such cases, hardcoding paths to pass as arguments to rlocation()
is easier
than plumbing through rlocationpath
values. This is acceptable if the paths
aren't going to change beyond your control. Remember:
- For runfiles in external repositories, use the repository's apparent name as the first path component. The rest of the path should be relative to that repository's root.
- For runfiles within the same repository, use your repository's
module
name fromMODULE.bazel
as the first path component. The rest of the path should be relative to your repository's root.
These differ from paths generated with rlocationpath
, which begin with the
canonical repository name or _main
, respectively. The rlocation()
runfiles
library function will then translate these path constants into actual file paths
at runtime, using the repo mapping mechanism.
Watch out for Windows!
If a runfile is an executable, you may need extra logic to add the .exe
extension on Windows. This is unnecessary when using rlocationpath
, since
it will always generate the correct executable path.
Predefined path constants repo mapping example¶
To see the repo mapping mechanism in action, we'll use the example program to
simulate passing a path constant to rlocation()
. This time, we'll run the
runfiles_demo
binary directly from bazel-bin
to avoid applying its args
and env
attributes.
Notice:
- The
runfiles link:
constructed manually for each path does not exist. This is because the first path component of each path on the command line is an apparent repository name. The actual runfiles links contain a path component for the corresponding canonical repository names.rlocationpath
values already contain translated repository names, which do produce valid runfiles links, as we saw in the output frombazel run //:runfiles_demo
. - The runfiles library translated these runfile paths to the same actual
locations as the
bazel run //:runfiles_demo
example. - Even though
runfiles_demo
isn't running in a sandbox, and symlinks exist underbazel-bin/runfiles_demo.runfiles
, the runfiles library still returns the actual absolute path.
We can inspect the runfiles links by interpolating the correct path segments for each repository, and see that they point to the same files:
Constructing runfile paths without a runfiles library¶
If there isn't a runfiles library available for your language, or it isn't yet Bzlmod compatible, you can still construct runfiles paths manually.
However, this requires using rlocationpath
to define runfile paths, then
passing them to the program as command line arguments or environment variables.
With Bzlmod enabled, that's the only reliable way to get the correct path
beginning with the canonical repo name or _main
without a runfiles library.
Well, that's not the only way to pass canonical repo names...
Technically, you could write a genrule
to emit rlocationpath
output into
a text file that a program could read at runtime. Or you could use a
genrule
, or write your own custom rule, to emit a source file defining
rlocationpath
constants to compile into a target. Or you could write a
macro to invoke
Label.repo_name on
a target label and inject that. (I actually tried all of these before
realizing I only needed to use the @rules_python//python/runfiles
library
instead of @bazel_tools//tools/python/runfiles
.) You could do
these things, but it's likely more work than passing an argument or an
environment variable. Or you could be super cool and update the runfiles
library for your language, or contribute the first implementation if one
doesn't already exist.
Finding the runfiles directory¶
If RUNFILES_DIR
is defined, that will be the location of your runfiles
directory. If it isn't, and ‑‑enable_runfiles is set to true
on your platform, stripping _manifest
from the end of RUNFILES_MANIFEST_FILE
will produce the runfiles directory path. This is how the
runfiles_demo.py script determines the runfiles
directory when RUNFILES_DIR
is undefined.
Alternatively, assuming sys.argv[0]
is the full path to your program,
f'{sys.argv[0]}.runfiles'
will be your runfiles directory if it exists.
(After translating this Pythonic syntax to the language of your choice, of
course. For example, in Bash, it would be $0.runfiles
.)
The Bash runfiles library bootstraps itself.
Fun fact: The runfiles.bash init code implements a minimal rlocation()
lookup for the runfiles.bash file itself, that demos all of these cases
in five lines of Bash. (Hat tip to László for pointing this
out.)
Enabling runfiles directories on Windows¶
Runfiles directories are disabled by default on Windows. This is because Bazel creates symlinks to actual files within the runfiles directory. Before Windows 10 Insiders build 14972, creating symlinks required using a console elevated to administrator mode. As mentioned above, making runfiles compatibile with Windows in light of this restriction is what motivated the initial development of runfiles libraries.
However, you can now enable Developer Mode on Windows, for later versions of Windows 10 or Windows 11. This will allow symlink creation without admin priviliges. Then explicitly set ‑‑enable_runfiles to enable Bazel to create runfiles directories.
Building the runfile path¶
Once you have the runfiles directory, join the result of rlocationpath
to the
end of it to locate a specific runfile. As with using a runfile library, it's
still incumbent upon your code to check the resulting runfile path for existence
before using it.
See the runfiles_demo.py program above, which manually constructs runfiles links alongside using a runfiles library, for an example of how to do this.
Starting child processes that need runfiles¶
Check the documentation for your runfiles library for advice on starting child processes that also use runfiles.
Pass a manually located runfiles directory as RUNFILES_DIR
If you aren't using a runfiles library, and located the runfiles directory
manually, then add its path to the child process's environment as
RUNFILES_DIR
.
In most cases, the library will provide a function to access runfiles related
environment variables (e.g., EnvVars()
, getEnvVars()
or Env()
). Add these
variables to the child process's environment, as they may not have been defined
in the parent process's environment.
Adapting the example from rules_python/python/runfiles/README.md, launching a subprocess in Python would look something like this:
Passing runfiles env vars to a child process in Python | |
---|---|
Conclusion¶
At this point, all of your targets depending on runfiles should build and run successfully under Bzlmod. Future changes to the canonical repo name format shouldn't break your targets. They should remain portable if built as an external dependency of another repo.
In the next post, we'll learn how to use
rules_pkg properly to avoid the
need for file path macros when building archives under Bzlmod. Following that,
we'll learn how to inject the path to an external repo into our BUILD
rules
when we really have to.
Postscript¶
If you want to learn how to write rules that make use of runfiles, see Jay Conrod's post Writing Bazel rules: data and runfiles. (You can also see that I ripped off his style of listing the links to every post in this series.)