Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and rules_pkg¶
The previous post in our Bzlmod migration series demonstrated how to make runfiles paths portable to a Bzlmod world. Another common source of Bzlmod file path breakages are misconfigured rules from rules_pkg, which contains rules for building archives from build outputs and/or external repositories. This post will explain key details of some of these rules, so you can stop "holding it wrong" and easily migrate archive targets to Bzlmod.
All posts in the "Migrating to Bazel Modules" series
- Migrating to Bazel Modules (a.k.a. Bzlmod) - The Easy Parts
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and Runfiles
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names and rules_pkg
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Macros, and Variables
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Module Extensions
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Fixing and Patching Breakages
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Repo Names, Again…
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Toolchainization
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Maintaining Compatibility, Part 1
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Maintaining Compatibility, Part 2
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Maintaining Compatibility, Part 3
 - Migrating to Bazel Modules (a.k.a. Bzlmod) - Maintaining Compatibility, Part 4
 
Prerequisites¶
The following advice assumes you've already imported rules_pkg into your
project and are comfortable with common archive formats and terminology. You
should also be aware of the following Bazel concepts in particular:
To review many key differences between Bzlmod and the legacy WORKSPACE model,
see the comparison table from the "Module Extensions"
post.
Finally, note that all of the links to rules_pkg files in this post refer to
rules_pkg version 1.0.1. Please take care to refer to the version currently
used in your project.
Replacing pkg_tar attributes with other rules¶
At EngFlow, we use the pkg_tar rule to build archives for deployment,
either from our own code or from external repositories. When we first enabled
Bzlmod, all of our pkg_tar archives that incorporate external repositories
broke silently.
We had defined pkg_tar attributes that relied on paths to files in external
repositories, which naturally contained the repository name. Enabling Bzlmod
rendered those paths invalid, as external repository paths now contain canonical
repository names—but the pkg_tar targets still built successfully. It
was only after a failed deployment that we began to detect the issues with our
targets.
We replaced these paths by updating our http_archive usage and applying the
pkg_files and pkg_filegroup rules from rules_pkg. This approach provides
stronger guarantees agaist silent breakages and other benefits, as explained
below.
Old and busted pkg_tar attributes¶
In the days before Bzlmod, we used the following pkg_tar attributes:
strip_prefix: used to strip the top level directory from an external repository archive before repackaging, or to strip a package directory from internal filesremap_paths: used to add a new prefix to external repository files, or to move internal files to a new locationmode: used to set the default file permissions for all filesmodes: used to set specific permissions for specific files
We repackage some external dependencies for our EngFlow system deployments using
pkg_tar. We define these dependencies as external repositories, and inject our
own BUILD file into them via the build_file attribute of the http_archive
directive. This BUILD file originally defined a filegroup called
:files, which would bundle all the repo's files into one target.
| Original filegroup in the external repo's BUILD file | |
|---|---|
The glob(["frobozz*/**"]) expression reached into the top level directory of
the unpacked archive to include all of its files. The frobozz* pattern in
particular matched any architecture or version number encoded in the top level
directory name, e.g., frobozz_linux_x64. We download an archive for each
supported operating system and CPU architecture, so each of these repositories
receives its own copy of this filegroup target.
In our main repo, we then generated pkg_tar archives for different operating
systems and architectures, each depending upon "@frobozz_%s_%s//:files" % (os,
arch). We used the strip_prefix attribute to remove the path prefix of every
file from the archive. strip_prefix doesn't accept glob patterns, so we had to
build the exact path prefix that we needed to strip.
In case you haven't already spotted the problem: The strip_prefix path
external/frobozz_%s_%s/frobozz-%s broke because frobozz_%s_%s specifies
an apparent repo name. The actual location of the files now contains the
canonical repo name in its path.
strip_prefix from pkg_tar will never fail.
The archive still built, since "@frobozz_%s_%s//:files" % (os, arch) is a
perfectly valid target. pkg_tar accepts any arbitrary list of file paths,
and strip_prefix will attempt to strip a path prefix, but will never
return an error.
In our case, none of the files had the intended strip_prefix removed,
rendering the resulting archive completely broken for our purposes.
Stripping external archive prefixes with http_archive¶
Replacing frobozz_%s_%s in the strip_prefix attribute above with a pattern
fitting the canonical repo name is not a solution. Recall this advice from
Bazel modules: Repository names and strict
deps
(emphasis theirs):
Note that the canonical name format is not an API you should depend on and is subject to change at any time. Instead of hard-coding the canonical name, use a supported way to get it directly from Bazel...
Fortunately, we did find a proper solution. We replaced strip_prefix from
pkg_tar with a combination of strip_prefix from http_archive and
strip_prefix from pkg_files.
http_archive contains a strip_prefix attribute specifically for removing
version-specific directory prefixes from external archives (emphasis and
paragraph break mine):
A directory prefix to strip from the extracted files. Many archives contain a top-level directory that contains all of the useful files in [the] archive. Instead of needing to specify this prefix over and over in the
build_file, this field can be used to strip it from all of the extracted files.For example, suppose you are using
foo-lib-latest.zip, which contains the directoryfoo-lib-1.2.3/under which there is aWORKSPACEfile and aresrc/,lib/, andtest/directories that contain the actual code you wish to build. Specifystrip_prefix = "foo-lib-1.2.3"to use thefoo-lib-1.2.3directory as your top-level directory.
Perhaps most importantly (emphasis mine):
If the specified prefix does not match a directory in the archive, Bazel will return an error.
strip_prefix from http_archive does The Right Thing.
This is exactly what we might've expected to have happened earlier, when
pkg_tar silently failed instead. We've since updated our http_archive
rules to make use of strip_prefix, eliminating this complexity from our
pkg_tar targets while gaining protection from failed prefix matches.
Setting file attributes with pkg_files¶
It's possible we could've left the :files target as a builtin Bazel
filegroup, as pkg_tar will parse files from the DefaultInfo provider
of srcs targets. However, we decided to go all in on rules_pkg targets,
and replaced it with a combination of pkg_files and pkg_filegroup.
These rules, and their accompanying attribute configuration structs, are located
within @rules_pkg//:mappings.bzl.
| Importing pkg_files, pkg_filegroup, etc. into a BUILD file | |
|---|---|
This is what the BUILD file injected into the external repository archive
looks like now:
This is clearly more verbose than the original filegroup implementation of the
:files target, but there are important benefits:
pkg_filesprovides a finer grained ability to set different permissions and other attributes on specific files via the pkg_attributes macro. This eliminates the need for themodeandmodesattributes ofpkg_tar.pkg_filegroupallows us to compose several separately configuredpkg_filesrules into a single target. It also accepts pkg_mkdirs, pkg_mklink, and otherpkg_filegrouprules.
Stripping canonical repository names with pkg_files¶
Perhaps the most important detail to notice, however: We must specify
strip_prefix = strip_prefix.from_pkg() to preserve the underlying directory
structure of the file tree within the originating Bazel package.
From the documentation for the strip_prefix attribute of pkg_files
(emphasis mine):
Use the
strip_prefixstruct to define this attribute. If this attribute is not specified, all directories will be stripped from all files prior to being included in packages (strip_prefix.files_only()).
Surprising default behavior—at first...
I found this files_only default a bit surprising at first, but it
eventually made sense. It forces you to use the strip_prefix struct
to explicitly specify how much of the directory prefix to strip. If you
don't specify strip_prefix, then stripping the entire prefix is at least a
reasonable default.
Now that we're using the strip_prefix attribute of http_archive, we no
longer need to strip the version-specific archive prefix from these files.
However, we still need to strip the external repository directory prefix,
containing the canonical repository name, upon which we must not depend. We
otherwise want to preserve the same directory structure as the original archive.
In our case, strip_prefix.from_pkg() without an additional path argument was
what we needed. From the strip_prefix struct docstring:
from_pkg(path): strip all directory components up to the current package, plus what's inpath, if provided.
Bzlmod compatibility for repackaged external archives
Between the strip_prefix behaviors of both http_archive and pkg_files,
we've avoided any need to depend on the canonical repository name of
external archives. Plus, both will break the build when their guarantees are
violated.
Adding the installation prefix using pkg_filegroup¶
During the course of fixing the broken archives, I came across this interesting
detail regarding remap_paths from rules_pkg issue #85 (emphasis mine):
One thing to consider is that remap_paths from pkg_tar is going away. It is a confusing piece of technical debt in the code so [our] goal is to eliminate it.
With the new pkg_filegroup target in place for the external repository, we
defined another pkg_filegroup target in our repository to apply the prefix
we wanted. This eliminated the need for the previous remap_paths attribute
that applied the prefix by matching the empty string.
Moving specific files with pkg_files¶
We're also using pkg_files and pkg_filegroup to update paths and file
permissions while packaging individual files, not just entire external archives.
This replaces our usage of the same pkg_tar attributes mentioned above, as
well as a macro I'd written to fix Bzlmod related path breakages.
The following examples are based on other pkg_tar targets from our repo.
A straightforward example¶
The first example packages one artifact we build, and one that ships with Bazel,
placing them both in the engflow/bin directory.
It's important to know that remap_paths is an attr.string_dict, whose
keys are plain strings. pkg_tar replaces the beginning of any path matching
the key with the corresponding value.
These path keys aren't subject to make variable substitution, and neither
are they target labels. So under Bzlmod, we needed the exact paths for files
packaged from external repos to produce remap_paths keys. I originally wrote
the following macro to provide the necessary path prefixes for PROCESS_WRAPPER
as seen in the example above:
| get_repo_path() implementation | |
|---|---|
However, the aforementioned commment about remap_paths from rules_pkg
issue #85 also mentioned (emphasis mine):
...pkg_files can strip out existing paths and rebase them with new ones. I am presuming from your examples that you need to remap paths, the eventual way to write that is with an intermediate pkg_file[s] target to do that remapping.
pkg_files does this by using its strip_prefix, prefix, and renames
attributes.
Be careful with prefix—but not too careful.
Technically, the prefix attribute doesn't work exactly the same as the
corresponding pkg_filegroup attribute. In practice, it works pretty much
as you would expect. Check the pkg_files and pkg_filegroup
documentation for details.
Unlike the remap_paths attribute from pkg_tar, the renames attribute of
pkg_files is an attr.label_keyed_string_dict. This means that the keys
are interpreted as target labels. They aren't subject to strip_prefix
substitution, and the target will break if any of them expand to multiple files
or don't match a declared dependency.
From the renames attribute documentation:
This attribute allows the user to override destinations of files in
pkg_filesrelative to theprefixattribute. Keys to the dict are source files/labels, values are destinations relative to theprefix, ignoring whatever value was provided forstrip_prefix.[ ...snip... ]
The following keys are rejected:
Any label that expands to more than one file (mappings must be one-to-one).
Any label or file that was either not provided or explicitly
excluded.
In this case, we added a single pkg_files target, which became the sole srcs
dependency for our pkg_tar target:
Important things to notice:
- We're using 
prefixto place both binaries inengflow/bininstead of mapping them individually. - We're using the default behavior of 
strip_prefixto remove the entire directory path from every file. - Since 
libfoo.dllis inrenames, it's unaffected bystrip_prefix, and the renamed path is relative toprefix. renamesguarantees that the rule will break if any labels are wrong.- We explicitly set the file permissions for both files to 
0755. This has ramifications discussed in the Discovering more surprisingpkg_tarbehavior section below. - Since we're not composing multiple 
pkg_filestargets,pkg_tarcan depend on it directly, without an intermediatepkg_filegroup. 
Bzlmod compatibility for individually repackaged external files
The most notable detail for Bzlmod purposes: The process-wrapper binary
has its entire directory path stripped by the default strip_prefix
behavior for pkg_files. We don't have to manipulate the canonical
repository path of the target at all, and can get rid of the
get_repo_path() macro.
A more complex example¶
This example is built on the same basic principles and mechanisms as the previous one, but as you can see, contains a little more complexity.
Here's what's happening in this example:
- It packages internal and external files, just like the previous example.
 - However, it contains 
glob(["engflow-services/**"])in itssrcs, andstrip_prefix = "engflow-services", to include a file and directory structure from this package'sengflow-servicessubdirectory. - It also sets both a default 
mode = "0444"and specificmodesfor three specific files:UIMain.jarfromremap_paths:0644- the external 
LINUX_SANDBOX:0555 engflow_servicefrom theglob(), after applyingstrip_prefix:0555
 
We'll take a closer look at these mode and modes values in the Discovering
more surprising pkg_tar behavior section below.
We know the drill by this point:
- Write 
pkg_filesrules to replace thestrip_prefix,remap_paths,mode, andmodesattributes ofpkg_tar. - Compose one or more 
pkg_filegrouptargets from the newpkg_filestargets. - Have 
pkg_tardepend on thesepkg_filegrouptargets in itssrcs. 
The trick here is to tease out targets that will preserve the existing directory structure and file attributes. (The intended file attributes, anyway...) This is what we ended up with:
This quite a bit more code than the original, but here's what happening:
- Because different files in 
engflow-services/binhave different file permissions, we need to have a differentpkg_filestarget for each different set of permissions. This is way more verbose than using themodesattribute ofpkg_tar—but we'll see why it's a good idea anyway. - The 
binariestarget grabs theengflow_servicebinary via theglob(["engflow-services/bin/**"])expression in itssrcs. This target is now what sets its file permissions to0555. - We use an intermediate 
pkg_filegroupcalledbinfilesto relocate several of thepkg_filestargets underengflow-services/bin - We use a different 
pkg_filestarget calleddatato collect all files and directories underengflow-services, except those underengflow-services/bin. This target strips theengflow-servicesdirectory prefix from these files, and set all file permissions to0444. - The 
pkg_tartarget now depends on both of the newpkg_filegrouptargets,binfilesanddata. 
Stripping internal directory names with pkg_files¶
Let's pay special attention to what's happening with the strip_prefix
attribute of pkg_files:
- Like the previous example, the 
binariestarget relies on the defaultstrip_prefixbehavior to strip the entire directory path from thelinux-sandboxbinary. We don't need theget_repo_path()macro to handle the canonical repository name anymore. - The 
jarsandlibstarget definerenamesfor their only files, so are unaffected bystrip_prefix. - The 
datatarget then collects the rest of theengflow-servicesfiles, and callsstrip_prefix.from_pkg("engflow-services"). This removes theengflow-servicesprefix while preserving the child directory structure. 
strip_prefix.from_pkg("engflow-services") is also guaranteed to fail if any of
the srcs don't match.  From the pkg_files documentation (emphasis
mine):
If prefix stripping fails on any file provided in
srcs, the build will fail.
pkg_files makes stripping internal directories safe, too.
Again, we have the comfort of knowing that stripping prefixes from
pkg_files inputs will break the build, as expected. No silent failures, as
was the case when stripping prefixes with pkg_tar.
Testing pkg_tar output¶
At this point, our pkg_tar targets now rely on http_archive, pkg_files,
and pkg_filegroup to strip directory prefixes, relocate files, and set file
permissions. We've replaced pkg_tar attributes with mechanisms providing
stronger correctness guarantees.
Even so, we still needed to validate that our updates actually preserved the original (or intended) output, or broke the build if not. Or, at the very least, we needed a process to ensure such violations are caught before or during code review.
Confirming the presence of specific files with verify_archive_test¶
It turns out that rules_pkg provides the verify_archive_test macro,
which instantiates a py_test rule to validate several archive properties.
It's imported via:
| Importing verify_archive_test into a BUILD file | |
|---|---|
Discovering this was quite the revelation; it doesn't (yet?) appear on the
rules_pkg reference documentation. Its docstring explains it pretty
well, though:
Now we've added verify_archive_test targets to validate most (still working on
all) of our pkg_tar targets. To make the structure clear by eliminating
unnecessary duplication, here's what our rules (sort of) look like, collapsed
into a macro.
Here's what the resulting BUILD file would look like.
Automatically verifying the structure of the resulting archive
We haven't overspecified the entire contents of the archive, but have just enough to give us confidence that the packaging retained the desired directory structure. If any of these key elements aren't present, we'll know right away, instead of after deployment.
Validating file attributes manually (for now)¶
One archive property that verify_archive_test does not verify (yet?) is file
attributes. However, we've developed a somewhat low overhead way of validating
them manually (for now).
Manual testing as a process for developing automated tests
Though it would be ideal if verify_archive_test already had this
capability, working through the problem manually can help define what the automation should look like. Perhaps someone (maybe even me!) can then add this feature to verify_archive_test.
The pkg_tar manifest file¶
In addition to generating a .tar file, pkg_tar also generates a .manifest
file containing containing metadata on every file in the archive in JSON
format.
For example, assuming the full target label of the pkg_files target from the
earlier example is //engflow/macos:pkg_macos, that rule will produce:
bazel-bin/engflow/macos/pkg_macos.manifestbazel-bin/engflow/macos/pkg_macos.tar
The contents of pkg_macos.manifest example looks like the following, after
piping it through jq via jq .
<bazel-bin/engflow/macos/pkg_macos.manifest.
(In the src attributes, ARCH represents the architecture-specific component
of my local build directory.)
You can see how the pkg_files attributes map directly to the fields in this
manifest:
- The 
originis the full target label of thepkg_filestarget.@@represents the main repo for the build. - The 
modewas set byattributes = pkg_attributes(mode = "0755"). - Each 
destresides inengflow/binas set byprefix = "engflow/bin". libfoo.dllwas renamedlibfoo.dylibper therenamesattribute.- The 
srcpath ofprocess-wrappercontains a segment corresponding to the canonical name of@bazel, which is_main~_repo_rules~bazel(for now). 
Though we're not using them in our example, you can see that file ownership attributes are represented in the manifest as well.
Manifest entries take precedence over pkg_tar default attributes.
Since all of the manifest information came from the pkg_rules target, it
will override any attribute values from pkg_tar.
Inspecting actual .tar file attributes¶
Checking out the .manifest file can serve as the first manual line of defense
against potential build problems. However, the final source of truth comes from
the archive listing itself, via tar tvf.
| Inspecting the contents of pkg_macos.tar | |
|---|---|
We can see that the files did indeed land at the locations and with the
attributes specified in the .manifest file. We didn't specify user or group
ownership, so those attributes default to uid 0 and gid 0.
Use the manifest file to begin investigating unexpected outcomes.
If anything here surprises us, we can begin tracing backwards by looking for
entries in the .manifest file that match specific dest values.
Comparing manifests and actual archives before and after updating pkg_tar¶
Now we have the two pieces we need to verify that updates to a pkg_tar rule
have the expected effects. Again, verify_archive_test should be the first
choice for validating the destination paths of expected contents. What we want
to do here is validate whether file attributes are as we expect.
That said, this will alert us to changes in contents that aren't explicitly
specified using verify_archive_test. Such information can be fed back into verify_archive_test as well.
The first step is to capture the manifest and archive contents before updating the pkg_tar target:
| Run this before the change | |
|---|---|
The next step is to make the actual change. After that, capture the new manifest and archive contents:
| Run this after the change | |
|---|---|
Now that we have the manifests and archive contents from before and after the
change, we can use a diff program to inspect them. Being a command line guy, I
tend to prefer vimdiff, part of the Vim text editor suite.
| Diffing the before and after manifests and archive contents | |
|---|---|
Discovering more surprising pkg_tar behavior¶
Following the manual verification process described just above actually revealed
something surprising about our pkg_linux target, as well as the pkg_tar rule
itself.
This is a rabbit hole leading to a maze of twisty passages, all alike.
This section explains in detail why one should avoid the mode and modes
attributes of pkg_tar. The details are not essential, however, but are
provided for the entertainment of those who like a good detective story.
Feel free to skip it, if you're willing to trust me on this.
Remember that before our change, the pkg_tar target specified these file
permissions:
| The original pkg_tar file permissions attributes | |
|---|---|
However, diffing the archive contents from before the change and after revealed
that pkg_tar never set these permissions correctly to begin with.
| Diffing the before and after pkg_linux.tar contents | |
|---|---|
Specifically, according to the previous pkg_tar attributes:
libfoo.soseemingly should've defaulted tomode = "0444"linux-sandboxshould've been set to0555per themodesattribute
Instead, both were set to 0755 before the update to use pkg_files.
Neither of these differences seem to have affected our deployments, but it is worth understanding why they exist. To begin, let's see a sample of the manifest:
Two things to notice:
libfoo.dllandlinux-sandboxare both set to mode0755, which doesn't appear anywhere in the originalpkg_tartarget.- The 
modeofengflow_serviceis empty; the default value of0444isn't recorded in the manifest file. 
pkg_tar passes mode and modes to the build_tar.py helper script as a
command line flags instead of recording them in the manifest file. This
explains why most of the files don't have a mode set in the manifest, but
still receive the correct file permissions.
In addition to that:
_pkg_tar_impldoesn't pass adefault_modewhen callingcreate_mapping_context_from_ctx, andcreate_mapping_context_from_ctxchecks forctx.attr.default_modeinstead ofctx.attr.mode.
As for why libfoo.dll and linux-sandbox were set to 0755, instead of being
empty in the manifest and being set by mode or modes:
libfoo.dllandlinux-sandboxare both defined ascc_binarytargets._pkg_tar_implcallsadd_label_listwithctx.attr.srcs, which is a list of Target values. In our case, twocc_binarytargets.add_label_listcallsprocess_srcon each element ofsrcs. Since thesecc_binaryvalues contain a DefaultInfo provider, but notPackageFilesInfo, etc. (likepkg_filesand other targets fromrules_pkg),process_srcreturnsFalse.- As a result, 
add_label_listthen callsadd_from_default_info. add_from_default_infochecks whether itssrcargument is an executable. If it is, it then sets the mode to0755.
Elementary, my dear Watson...
So pkg_tar forces the modes of these cc_binary targets to become 0755.
This goes into the manifest, taking precedence over pkg_tar's own mode
and modes attributes. Mystery solved.
Conclusion¶
The moral of the story is: Do not use the strip_prefix, remap_paths,
mode, and modes attributes of pkg_tar. remap_paths is deprecated, and
the others are unreliable. Use pkg_files, et. al., to explicitly specify the
locations and attributes of your pkg_tar input artifacts.
Once someone noticed the problem, we set about on a journey of source studying,
trial, and error. Now our pkg_tar targets are Bzlmod compatible, reasonably
future proof, and protected by guarantees that will break the build when
violated. They're even tested now, too.
In retrospect, some of the above seems pretty obvious, but we collectively were unaware of the options at the time. Perhaps you're in the same boat we were; I hope this advice spares you the same level of effort to achieve similar results.
In this post and the previous, we've learned how "holding runfiles libraries and
rules_pkg right" to achieve Bzlmod compatibility and avoid canonical
repository names altogether. In the next post, we'll learn how to inject
canonical repository names into our BUILD rules when we really have no other
choice.
Credit where it's due¶
It was Corbin McNeely-Smith from whom I'd first heard the term "holding it
wrong," and who first noticed the pkg_tar breakage. He was also the first to
experiment with pkg_files and pkg_filegroup, which I then built upon to
update our other pkg_tar targets eventually.
Patrick Ziegler was the first to introduce both Corbin and I to
verify_archive_test. This rule proved invaluable when fixing and updating our
pkg_tar targets.
Isaac Truett noticed that my initial replacement of filegroup targets with
pkg_files turned off executable bits on files within pkg_tar archives. I
investigated, and learned that the pkg_files default mode of 0644 overrode
the mode and modes values from pkg_tar. This wasn't caught by
verify_archive_test, leading me to develop the manual verification process
described above.
Updates¶
2025-10-09¶
- 
Put the list of all posts in the series into the collapsible All posts in the "Migrating to Bazel modules" series info block.
 - 
Added a suggestion to review the Module Extensions comparison table to the Prerequisites section.