The following explains how inode
truncation in classic cpio formats (odc and newc) interacts with
hardlinked files, and under what conditions you may encounter problems.
Unless you are still using cpio for full system backups, this is probably a
non‑issue. Nonetheless, to ensure correct hardlink preservation with GNU cpio:
- Use
--renumber-inodes(or--reproducible, which also enables renumbering)
Classic cpio formats store inode numbers in fixed‑width fields:
odcuses 16‑bit inode numbersnewcuses 32‑bit inode numbers
Modern filesystems (ext4, XFS, btrfs, ZFS, etc.) commonly use 64‑bit inode
numbers. When a filesystem inode does not fit into the field size of the old
cpio formats, GNU cpio truncates it to the available width. For example:
- real inode:
123456789 - truncated:
23456789
If two different real inode numbers share the same low bits, they collapse to the same stored inode value in the archive.
cpio uses inode numbers only when handling hardlinks. It checks each file’s
link count (nlink > 1) and uses the inode number to match entries that refer
to the same underlying file. Later entries in a hardlink group store no file
contents and rely on the first match.
-
No hardlinks = no problem
If none of the files in the archive are hardlinks (i.e. nonlink > 1entries), inode truncation cannot cause corruption because the inode number is effectively ignored. -
One hardlink group = no problem
Even if the stored inode is truncated, nothing else can collide with it. -
Multiple hardlink groups = corruption is possible (but unlikely)
Extraction becomes unsafe only if:- multiple real inode numbers truncate to the same stored inode value, and
- some of those truncated values correspond to hardlink groups (
nlink > 1), and - the archive order interleaves the groups in a way that causes overlap.
If all three conditions are met, GNU cpio might incorrectly merge distinct
hardlink groups into a single group during extraction, causing data loss.
Note: A harmful collision remains unlikely. Truncation can happen on modern filesystems, but actual corruption requires two different 64‑bit inode values to truncate to the same smaller number and for those files to appear as hardlink groups in an unfortunate order within the archive. That combination is rare. For newc archives with a couple thousand entries, inode truncation is effectively a non‑issue. Even with 5% of your archive as hardlink pairs on a host with billions of inodes, you are in the “one in ten million” territory for any harmful hardlink collisions. (odc, however, would start to get risky.)
GNU cpio provides:
--renumber-inodes
--reproducible
Either option rewrites inode numbers during archive creation so that:
- All non‑hardlinked files use inode
0 - Each hardlink group is assigned a new sequential inode number (
1,2,3, …) - All files in the same hardlink group share the same renumbered inode
This avoids truncation collisions entirely for a 'normal' archive.
Note: Renumbering does not make truncation collisions completely
impossible. If an archive contains enough hardlink groups to exhaust the
available inode space of the chosen format, renumbered inodes would still
truncate and collide. However, in GNU cpio’s renumbering modes, all
non‑hardlinked files use inode 0, and only hardlink groups are assigned
sequential inode numbers (1, 2, 3, …). That means you would need:
- over 26 thousand hardlink groups for
odc - over 4.3 billion hardlink groups for
newc
before truncation of renumbered inodes becomes a concern.
In practice, --renumber-inodes makes the problem so unlikely that it is safe
to ignore for any realistic archive.
A previous version of this gist stated that BSD cpio (libarchive) renumbers inodes by default. Further testing has shown that this is not the case: BSD cpio writes the real inode numbers from the filesystem, just like GNU cpio in normal mode. It will therefore suffer from inode truncation on systems with very large inode numbers and can run into the same hardlink related issues described above.