Skip to content

Instantly share code, notes, and snippets.

@ruario
Last active January 18, 2026 19:05
Show Gist options
  • Select an option

  • Save ruario/b09334336f202bef0c3e1b1c6f5b1451 to your computer and use it in GitHub Desktop.

Select an option

Save ruario/b09334336f202bef0c3e1b1c6f5b1451 to your computer and use it in GitHub Desktop.
GNU cpio and inode truncation considerations

GNU cpio and inode truncation considerations

The following explains how inode truncation in classic cpio formats (odc and newc) interacts with hardlinked files, and under what conditions you may encounter problems.

TL;DR

Unless you are still using cpio for full system backups, this is probably a non‑issue. Nonetheless, to ensure correct hardlink preservation with GNU cpio:

  • Use --renumber-inodes (or --reproducible, which also enables renumbering)

What inode truncation is

Classic cpio formats store inode numbers in fixed‑width fields:

  • odc uses 16‑bit inode numbers
  • newc uses 32‑bit inode numbers

Modern filesystems (ext4, XFS, btrfs, ZFS, etc.) commonly use 64‑bit inode numbers. When a filesystem inode does not fit into the field size of the old cpio formats, GNU cpio truncates it to the available width. For example:

  • real inode: 123456789
  • truncated: 23456789

If two different real inode numbers share the same low bits, they collapse to the same stored inode value in the archive.

cpio uses inode numbers only when handling hardlinks. It checks each file’s link count (nlink > 1) and uses the inode number to match entries that refer to the same underlying file. Later entries in a hardlink group store no file contents and rely on the first match.

When truncation is problematic

  • No hardlinks = no problem
    If none of the files in the archive are hardlinks (i.e. no nlink > 1 entries), inode truncation cannot cause corruption because the inode number is effectively ignored.

  • One hardlink group = no problem
    Even if the stored inode is truncated, nothing else can collide with it.

  • Multiple hardlink groups = corruption is possible (but unlikely)
    Extraction becomes unsafe only if:

    1. multiple real inode numbers truncate to the same stored inode value, and
    2. some of those truncated values correspond to hardlink groups (nlink > 1), and
    3. the archive order interleaves the groups in a way that causes overlap.

If all three conditions are met, GNU cpio might incorrectly merge distinct hardlink groups into a single group during extraction, causing data loss.

Note: A harmful collision remains unlikely. Truncation can happen on modern filesystems, but actual corruption requires two different 64‑bit inode values to truncate to the same smaller number and for those files to appear as hardlink groups in an unfortunate order within the archive. That combination is rare. For newc archives with a couple thousand entries, inode truncation is effectively a non‑issue. Even with 5% of your archive as hardlink pairs on a host with billions of inodes, you are in the “one in ten million” territory for any harmful hardlink collisions. (odc, however, would start to get risky.)

Working around the issue (renumbering)

GNU cpio provides:

--renumber-inodes
--reproducible

Either option rewrites inode numbers during archive creation so that:

  • All non‑hardlinked files use inode 0
  • Each hardlink group is assigned a new sequential inode number (1, 2, 3, …)
  • All files in the same hardlink group share the same renumbered inode

This avoids truncation collisions entirely for a 'normal' archive.

Note: Renumbering does not make truncation collisions completely impossible. If an archive contains enough hardlink groups to exhaust the available inode space of the chosen format, renumbered inodes would still truncate and collide. However, in GNU cpio’s renumbering modes, all non‑hardlinked files use inode 0, and only hardlink groups are assigned sequential inode numbers (1, 2, 3, …). That means you would need:

  • over 26 thousand hardlink groups for odc
  • over 4.3 billion hardlink groups for newc

before truncation of renumbered inodes becomes a concern.

In practice, --renumber-inodes makes the problem so unlikely that it is safe to ignore for any realistic archive.


Correction note

A previous version of this gist stated that BSD cpio (libarchive) renumbers inodes by default. Further testing has shown that this is not the case: BSD cpio writes the real inode numbers from the filesystem, just like GNU cpio in normal mode. It will therefore suffer from inode truncation on systems with very large inode numbers and can run into the same hardlink related issues described above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment