Skip to content

Instantly share code, notes, and snippets.

@wolfeidau
Created January 17, 2026 08:43
Show Gist options
  • Select an option

  • Save wolfeidau/0be9b3b56ebca452375404baddf33777 to your computer and use it in GitHub Desktop.

Select an option

Save wolfeidau/0be9b3b56ebca452375404baddf33777 to your computer and use it in GitHub Desktop.
A recent side project specification written with Claude Code

Content-Addressable Cache Implementation Plan

Overview

A Go-based unified package cache supporting NPM packages and Go modules, with content-addressable storage using BLAKE3, multiple storage backends, and combined TTL/LRU expiration.

Research Findings

pnpm's CAFS (model we'll follow for NPM)

  • Files stored by SHA-512 hash at ~/.pnpm-store/v3/files/{hash[0:2]}/{hash}
  • File-level deduplication across all packages
  • Hardlinks/reflinks to project's node_modules

Go Module Cache (GOPROXY protocol)

  • Structure: {module}/@v/{version}.{info|mod|zip|ziphash}
  • Protocol endpoints: /@v/list, /@v/{version}.info, /@v/{version}.mod, /@v/{version}.zip
  • .ziphash contains h1:base64(sha256) for integrity

Key Insight

We can unify these: store all content in CAFS, then provide protocol adapters that expose the appropriate interface for each package type.

Architecture

content-cache/
├── cache.go              # Main Cache interface
├── hash.go               # BLAKE3 hashing
├── store/
│   ├── store.go          # Content-addressable store interface
│   └── cafs.go           # CAFS implementation
├── backend/
│   ├── backend.go        # Storage backend interface
│   ├── filesystem.go     # Local filesystem
│   └── s3.go             # S3/Minio
├── protocol/
│   ├── npm/
│   │   ├── registry.go   # NPM registry protocol handler
│   │   └── tarball.go    # Tarball extraction/storage
│   └── goproxy/
│       ├── proxy.go      # GOPROXY protocol handler
│       └── module.go     # Module metadata handling
├── expiry/
│   └── expiry.go         # TTL + LRU expiration
├── compress/
│   └── zstd.go           # Compression
└── server/
    └── http.go           # HTTP server exposing all protocols

Core Design

1. Content-Addressable File Store (CAFS)

All package content ultimately stored as content-addressed blobs:

// Hash is a BLAKE3 256-bit digest
type Hash [32]byte

func (h Hash) String() string        // hex encoding
func (h Hash) Dir() string           // first 2 chars for sharding

// Store provides content-addressable storage
type Store interface {
    // Put stores content, returns its hash
    Put(ctx context.Context, r io.Reader) (Hash, error)

    // Get retrieves content by hash
    Get(ctx context.Context, h Hash) (io.ReadCloser, error)

    // Has checks if content exists
    Has(ctx context.Context, h Hash) (bool, error)

    // Delete removes content
    Delete(ctx context.Context, h Hash) error
}

Storage layout:

blobs/
├── ab/
│   └── abcd1234...  # full hash as filename
├── cd/
│   └── cdef5678...
└── ...

2. Protocol Adapters

Each package type gets a protocol adapter that translates to/from CAFS:

NPM Registry Protocol

type NPMRegistry interface {
    // GetPackageMetadata returns package.json metadata
    GetPackageMetadata(ctx context.Context, name string) (*PackageMetadata, error)

    // GetTarball retrieves a package tarball
    GetTarball(ctx context.Context, name, version string) (io.ReadCloser, error)

    // PutTarball stores a package tarball (from upstream)
    PutTarball(ctx context.Context, name, version string, r io.Reader) error
}

NPM index storage (maps package@version → content hash):

npm/
├── lodash/
│   ├── metadata.json         # combined package metadata
│   └── versions/
│       ├── 4.17.21.json      # version-specific metadata + content hash
│       └── 4.17.20.json

GOPROXY Protocol

type GoProxy interface {
    // List returns available versions
    List(ctx context.Context, module string) ([]string, error)

    // Info returns version metadata
    Info(ctx context.Context, module, version string) (*VersionInfo, error)

    // Mod returns go.mod content
    Mod(ctx context.Context, module, version string) ([]byte, error)

    // Zip returns module zip
    Zip(ctx context.Context, module, version string) (io.ReadCloser, error)

    // Store caches a module from upstream
    Store(ctx context.Context, module, version string, info, mod, zip io.Reader) error
}

Go module index storage:

goproxy/
├── github.com/
│   └── pkg/
│       └── errors/
│           └── @v/
│               ├── list              # newline-separated versions
│               ├── v0.9.1.info       # {"Version":"...","Time":"...","ZipHash":"blake3:..."}
│               ├── v0.9.1.mod        # go.mod content (or hash reference)
│               └── v0.9.1.ziphash    # hash of zip in CAFS

3. Unified HTTP Server

Single server exposing multiple protocols:

GET /npm/{package}                    → Package metadata
GET /npm/{package}/-/{tarball}        → Tarball download

GET /goproxy/{module}/@v/list         → Version list
GET /goproxy/{module}/@v/{ver}.info   → Version info
GET /goproxy/{module}/@v/{ver}.mod    → go.mod
GET /goproxy/{module}/@v/{ver}.zip    → Module zip

GET /health                           → Health check
GET /metrics                          → Prometheus metrics

4. Upstream Proxying

Cache acts as a pull-through cache:

  1. Request comes in for package/module
  2. Check local cache (CAFS + index)
  3. If miss: fetch from upstream, store in CAFS, update index
  4. Return content
type Upstream interface {
    Fetch(ctx context.Context, req Request) (Response, error)
}

// NPM upstream: registry.npmjs.org
// Go upstream: proxy.golang.org (or direct VCS)

5. Expiration Strategy

Combined TTL + LRU with reference counting:

type ExpiryManager interface {
    // Touch updates access time for content
    Touch(ctx context.Context, h Hash) error

    // Run starts background expiration
    Run(ctx context.Context) error
}
  • TTL: Content expires N days after last access
  • LRU: When cache exceeds max size, evict least-recently-used
  • Reference counting: Index entries reference content hashes; content only deleted when unreferenced

Design Decisions

  • NPM storage: Store whole tarballs (simpler, faster serving, deduplicated at tarball level)
  • First protocol: GOPROXY (simpler protocol, validates CAFS before NPM complexity)

Implementation Phases

Phase 1: Core CAFS

  1. hash.go - BLAKE3 Hash type with streaming computation
  2. backend/backend.go - Backend interface
  3. backend/filesystem.go - Filesystem backend with atomic writes
  4. store/cafs.go - Content-addressable store on top of backend

Phase 2: GOPROXY Protocol + HTTP Server

  1. protocol/goproxy/proxy.go - GOPROXY HTTP handler
  2. protocol/goproxy/module.go - Module index management
  3. server/http.go - HTTP server with GOPROXY routes
  4. Upstream proxy to proxy.golang.org
  5. Integration test: GOPROXY=http://localhost:8080 go mod download

Phase 3: Expiration

  1. expiry/expiry.go - TTL + LRU manager
  2. Metadata storage for access times
  3. Background garbage collection

Phase 4: NPM Registry Protocol

  1. protocol/npm/registry.go - NPM registry HTTP handler
  2. protocol/npm/tarball.go - Tarball handling (store whole tarballs in CAFS)
  3. Upstream proxy to registry.npmjs.org
  4. Package metadata caching
  5. Integration test: npm config set registry http://localhost:8080/npm/ && npm install

Phase 5: S3 Backend

  1. backend/s3.go - S3/Minio backend
  2. Multipart upload support for large files

Phase 6: Compression

  1. compress/zstd.go - zstd compression
  2. Transparent compression in CAFS layer

Phase 7: Observability

  1. OpenTelemetry tracing
  2. Prometheus metrics
  3. Structured logging (slog)

Files to Create

File Purpose
cache.go Main entry point, configuration
hash.go BLAKE3 Hash type
store/store.go Store interface
store/cafs.go CAFS implementation
backend/backend.go Backend interface
backend/filesystem.go Filesystem backend
backend/s3.go S3 backend
protocol/goproxy/proxy.go GOPROXY handler
protocol/goproxy/module.go Module index
protocol/npm/registry.go NPM handler
protocol/npm/tarball.go Tarball handling
server/http.go HTTP server
expiry/expiry.go Expiration manager
compress/zstd.go Compression

Verification Plan

  1. Unit tests for CAFS operations (put/get/delete)
  2. Integration tests with go command against GOPROXY endpoints
  3. Integration tests with npm/pnpm against NPM endpoints
  4. Benchmark tests for throughput and latency
  5. Example: Set up as local proxy, run go mod download and npm install

Usage Example

# Start the cache server
$ content-cache serve --listen :8080 --storage ./cache

# Configure Go to use it
$ export GOPROXY=http://localhost:8080/goproxy,direct

# Configure npm to use it
$ npm config set registry http://localhost:8080/npm/

# Now installs are cached locally
$ go mod download
$ npm install
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment