Skip to content

Instantly share code, notes, and snippets.

@samoht
Last active January 14, 2026 20:30
Show Gist options
  • Select an option

  • Save samoht/6f39e23e0191ac86eabe5d94db2a506f to your computer and use it in GitHub Desktop.

Select an option

Save samoht/6f39e23e0191ac86eabe5d94db2a506f to your computer and use it in GitHub Desktop.
RFC: Unified Package Cache

RFC: Unified Package Cache

Principle

One build flow, one cache, one secondary index.

All packages use the same build flow. The variations are:

  1. Cache key: what identifies a cached build
  2. Context: where the package builds

Package Types

Type Source Context Cache Key
Workspace Local dune project default -- (not cached)
Vendored duniverse/ default -- (not cached)
Locked dune.lock default <name>.<ver>-<bid8>
Toolchain dune.lock (compiler) default <name>.<ver>-<bid8>
Dev tool Auto-solved tools-<name> See below

Where <bid8> = 8-char build_id (Merkle hash of opam content + deps' build_ids).

Vendored packages are never cached because users may edit them.

Dev Tool Cache Keys

Type Key Example
Compiler-independent <name>.<ver>-<checksum8> ocamlformat.0.26.2-a1b2c3d4
Compiler-dependent <name>.<ver>-<bid8> odoc.2.4.0-f7e8d9c0

Compiler-independent tools (ocamlformat, dune-release) produce identical binaries regardless of project OCaml version. Using source checksum enables cross-project sharing.

Compiler-dependent tools (odoc, merlin, utop) interact with compiler internals. Their build_id already includes the compiler as a dependency, ensuring correct cache separation.

Directory Structure

~/.cache/dune/
├── db/files/v5/<hash>/    # Content-addressed cache (existing)
└── index/                 # Secondary index (new)
    ├── fmt.0.9.0-a1b2c3d4 → ../db/files/v5/deadbeef...
    ├── ocamlformat.0.26.2-b3c4d5e6 → ../db/files/v5/cafebabe...
    └── odoc.2.4.0-f7e8d9c0 → ../db/files/v5/12345678...

Why a Secondary Index?

The content-addressed cache keys on rule hash (all inputs). This prevents sharing:

Project A (OCaml 5.2) + ocamlformat 0.26.2 → rule hash X
Project B (OCaml 4.14) + ocamlformat 0.26.2 → rule hash Y

Different rule hashes, but identical binary. The secondary index uses simpler keys that capture only what matters for correctness.

Why Symlinks?

  1. No duplication: one copy in content-addressed store
  2. Atomic: symlink() is atomic on POSIX
  3. Simple check: symlink exists = cached
  4. Self-validating: target hash ensures correctness

Build Flow

compute cache_key
        │
        ├── None (workspace/vendored)
        │         │
        │         ▼
        │   build normally ──────────────────┐
        │                                    │
        └── Some key                         │
                  │                          │
         ┌───────┴───────┐                   │
         ▼               ▼                   │
   symlink exists   symlink missing          │
         │               │                   │
         ▼               ▼                   │
   restore from    build normally            │
   cache           store + create symlink    │
         │               │                   │
         └───────┬───────┘                   │
                 ▼                           │
         install to target_dir ◄─────────────┘

Concurrency

Multiple dune processes may cache the same package simultaneously:

  1. Build to temporary directory
  2. Store in content-addressed cache (atomic via rename)
  3. Create symlink (atomic, idempotent)

If two processes race, both build (wasted work, but correct). First to finish creates symlink; second sees it exists and skips.

Cache Management

dune cache trim              # Remove least-recently-used entries
dune cache trim --size 10G   # Keep only 10GB
dune cache clear             # Remove everything

Open Questions

Source checksum unavailable

Dev tool sources without checksums (git pins, local packages) fall back to build_id:

ocamlformat.0.26.2-<bid8>  # Fallback

Reduces sharing but maintains correctness.

Platform-specific binaries

Cache keys don't include platform. This assumes:

  • Compiler-independent tools produce identical binaries (pure OCaml)
  • Compiler-dependent tools include platform via deps' build_ids

If needed, add platform suffix: ocamlformat.0.26.2-<checksum8>-macos-arm64

References

RFC: Unified Package Cache

Principle

One build flow, one cache, one secondary index.

All packages use the same build flow. The variations are:

  1. Cache key: what identifies a cached build
  2. Context: where the package builds

Package Types

Type Source Context Cache Key
Workspace Local dune project default -- (not cached)
Vendored duniverse/ default -- (not cached)
Locked dune.lock default <name>.<ver>-<bid8>
Toolchain dune.lock (compiler) default <name>.<ver>-<bid8>
Dev tool Auto-solved tools-<name> See below

Where <bid8> = 8-char build_id (Merkle hash of opam content + deps' build_ids).

Vendored packages are never cached because users may edit them.

Dev Tool Cache Keys

Type Key Example
Compiler-independent <name>.<ver>-<checksum8> ocamlformat.0.26.2-a1b2c3d4
Compiler-dependent <name>.<ver>-<bid8> odoc.2.4.0-f7e8d9c0

Compiler-independent tools (ocamlformat, dune-release) produce identical binaries regardless of project OCaml version. Using source checksum enables cross-project sharing.

Compiler-dependent tools (odoc, merlin, utop) interact with compiler internals. Their build_id already includes the compiler as a dependency, ensuring correct cache separation.

Directory Structure

~/.cache/dune/
├── db/files/v5/<hash>/    # Content-addressed cache (existing)
└── index/                 # Secondary index (new)
    ├── fmt.0.9.0-a1b2c3d4 → ../db/files/v5/deadbeef...
    ├── ocamlformat.0.26.2-b3c4d5e6 → ../db/files/v5/cafebabe...
    └── odoc.2.4.0-f7e8d9c0 → ../db/files/v5/12345678...

Why a Secondary Index?

The content-addressed cache keys on rule hash (all inputs). This prevents sharing:

Project A (OCaml 5.2) + ocamlformat 0.26.2 → rule hash X
Project B (OCaml 4.14) + ocamlformat 0.26.2 → rule hash Y

Different rule hashes, but identical binary. The secondary index uses simpler keys that capture only what matters for correctness.

Why Symlinks?

  1. No duplication: one copy in content-addressed store
  2. Atomic: symlink() is atomic on POSIX
  3. Simple check: symlink exists = cached
  4. Self-validating: target hash ensures correctness

Build Flow

compute cache_key
        │
        ├── None (workspace/vendored)
        │         │
        │         ▼
        │   build normally ──────────────────┐
        │                                    │
        └── Some key                         │
                  │                          │
         ┌───────┴───────┐                   │
         ▼               ▼                   │
   symlink exists   symlink missing          │
         │               │                   │
         ▼               ▼                   │
   restore from    build normally            │
   cache           store + create symlink    │
         │               │                   │
         └───────┬───────┘                   │
                 ▼                           │
         install to target_dir ◄─────────────┘

Concurrency

Multiple dune processes may cache the same package simultaneously:

  1. Build to temporary directory
  2. Store in content-addressed cache (atomic via rename)
  3. Create symlink (atomic, idempotent)

If two processes race, both build (wasted work, but correct). First to finish creates symlink; second sees it exists and skips.

Cache Management

dune cache trim              # Remove least-recently-used entries
dune cache trim --size 10G   # Keep only 10GB
dune cache clear             # Remove everything

What This Simplifies

Before:

src/dune_rules/
├── pkg_toolchain.ml      # Separate toolchain cache
├── dev_tool_cache.ml     # Separate dev tool cache
└── pkg_rules.ml          # Special-case handling

~/.cache/dune/
├── toolchains/           # Full copies
├── tools/                # Separate index
└── files/                # Build cache

After:

src/dune_rules/
├── pkg_cache.ml          # Unified cache
└── pkg_rules.ml          # One flow

~/.cache/dune/
├── files/v5/             # Content-addressed (unchanged)
└── index/                # Single secondary index

Migration

Old cache directories (toolchains/, tools/) are not migrated. On first use:

  • Relocatable compilers rebuild and cache in new location
  • Dev tools rebuild and cache in new location

To reclaim space:

rm -rf ~/.cache/dune/toolchains ~/.cache/dune/tools

Open Questions

Source checksum unavailable

Dev tool sources without checksums (git pins, local packages) fall back to build_id:

ocamlformat.0.26.2-<bid8>  # Fallback

Reduces sharing but maintains correctness.

Platform-specific binaries

Cache keys don't include platform. This assumes:

  • Compiler-independent tools produce identical binaries (pure OCaml)
  • Compiler-dependent tools include platform via deps' build_ids

If needed, add platform suffix: ocamlformat.0.26.2-<checksum8>-macos-arm64

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment