Skip to content

Instantly share code, notes, and snippets.

@willkelly
Last active January 6, 2026 23:33
Show Gist options
  • Select an option

  • Save willkelly/f6d6be5fb444b37d3c4fb2bd56942fdb to your computer and use it in GitHub Desktop.

Select an option

Save willkelly/f6d6be5fb444b37d3c4fb2bd56942fdb to your computer and use it in GitHub Desktop.
ZFS GCS Replication Benchmark Results - ClusterDuck

ZFS GCS Replication Benchmark Results

Date: January 6, 2025
Worker: clusterduck-worker-spot-stage-whz7 (c3-standard-22)
Region: us-west1-b
GCS Bucket: clusterduck-zfs-snapshots-stage (same region)
ZFS Pool: spaces (encrypted AES-256-GCM, compression=zstd-3)

Key Findings

1. Raw Send (-w) is Critical

ZFS send has two modes:

  • Standard send: Decompresses data before sending, recompresses on receive
  • Raw send (-w): Sends compressed+encrypted blocks directly

Always use -w flag for encrypted/compressed pools!

2. zstd-3 vs lz4 Compression

Logical Size lz4 Transfer zstd-3 Transfer Improvement
100 MB 19 MB 6.6 MB 2.9x smaller
500 MB 95 MB 33 MB 2.9x smaller
2 GB 380 MB 131 MB 2.9x smaller
10 GB 1.9 GB 657 MB 2.9x smaller

zstd-3 achieves ~15x compression on JSON data (vs ~5x for lz4).

Test Scenarios

Scenario Base Size Incrementals Compression Actual Transfer
small 100 MB 5 × 1 MB appends ~15x 6.6 MB base, 70 KB/incr
medium 500 MB 5 × 5 MB appends ~15x 33 MB base, 340 KB/incr
large 2 GB 5 × 10 MB appends ~15x 131 MB base, 670 KB/incr
xlarge 10 GB 5 × 50 MB appends ~15x 657 MB base, 3.3 MB/incr

Results Summary (zstd-3)

Base Snapshot Restore Times

Logical Size Compressed Size Send Time Recv Time Send MB/s Recv MB/s
100 MB 6.6 MB 269 ms 203 ms 39 33
500 MB 33 MB 400 ms 534 ms 83 62
2 GB 131 MB 1.2 sec 1.8 sec 110 71
10 GB 657 MB 6.0 sec 8.5 sec 111 77

Incremental Snapshot Performance

Logical Append Compressed Size Send Time Recv Time
1 MB 70 KB 55 ms 93 ms
5 MB 340 KB 55 ms 97 ms
10 MB 670 KB 60 ms 97 ms
50 MB 3.3 MB 103 ms 127 ms

Estimated Full Restore Times

For a complete restore (base + 5 incrementals):

Space Size (Logical) Base Recv 5 Incrementals Total Restore
100 MB 0.2 sec 0.5 sec < 1 sec
500 MB 0.5 sec 0.5 sec ~1 sec
2 GB 1.8 sec 0.5 sec ~2.3 sec
10 GB 8.5 sec 0.6 sec ~9 sec
50 GB (estimated) ~43 sec ~3 sec ~46 sec

Key Observations

1. Compression is Massive

With zstd-3, JSON data compresses ~15x:

  • 10GB logical = 657MB actual transfer
  • This is 2.9x better than lz4

2. Incrementals are Extremely Fast

With append-style writes (like SQLite WAL):

  • 1MB logical append = 70KB transfer = 93ms restore
  • 50MB logical append = 3.3MB transfer = 127ms restore

3. Latency Floor ~50-90ms

Even tiny incrementals take 50-90ms due to:

  • GCS API round-trip (~30-50ms)
  • ZFS snapshot overhead (~10-20ms)
  • Process spawn overhead

4. Throughput Characteristics

  • Send: 80-110 MB/s of compressed data
  • Recv: 60-77 MB/s of compressed data (disk write bound)

Methodology

Test Data Generation

The benchmark generates realistic JSON data to simulate SQLite/document workloads:

{"id":1736198400001,"ts":1736198400001,"uid":"u1234","sid":"s56789","act":"click","pg":"/dashboard","dur":1500,"ok":true,"tg":["web","mobile"],"ref":"r123","v":"1.0"}

Each record contains:

  • Timestamps and sequential IDs (high entropy)
  • User/session identifiers (medium entropy, repeated patterns)
  • Action types and page paths (low entropy, from fixed sets)
  • Boolean flags and arrays (mixed)

This produces ~15x compression with zstd-3, which is better than typical production workloads (expect 5-10x for real SQLite data).

Incremental Simulation

Incrementals append data to the file (like SQLite WAL writes) rather than overwriting. This means ZFS only needs to transfer the new blocks, resulting in small incremental snapshots proportional to the append size.

Raw Send Mode

All benchmarks use zfs send -w (raw mode) which:

  • Preserves on-disk compression during transfer
  • Preserves encryption (no key needed on receiving side until mount)
  • Transfers actual block data, not logical data

Production Recommendations

  1. Use zstd-3 compression: ~3x better than lz4 for JSON/SQLite data
  2. Use raw sends (-w): Required for compressed/encrypted pools
  3. Frequent small incrementals: 1-10MB appends restore in <100ms
  4. Pre-warm for large spaces: 10GB+ takes ~10 seconds to restore

Comparison: ZFS vs Litestream

We previously benchmarked Litestream GCS recovery on similar hardware. Here's how they compare:

Recovery Time Comparison

Database Size Litestream (c3-88, Hyperdisk) ZFS (c3-22, Local NVMe) ZFS Advantage
100 MB 1.4 sec 0.2 sec 7x faster
500 MB 5.6 sec 0.5 sec 11x faster
1 GB 11.0 sec ~1.0 sec (est) 11x faster
2 GB 21.6 sec 1.8 sec 12x faster
5 GB 51.6 sec ~4.5 sec (est) 11x faster
10 GB ~103 sec (est) 8.5 sec 12x faster

Throughput Comparison

Metric Litestream ZFS (zstd-3)
Peak throughput 90-99 MB/s 77-111 MB/s
Effective throughput (logical data) 90-99 MB/s ~1,150 MB/s*

*ZFS transfers compressed data, so 77 MB/s of compressed data = ~1,150 MB/s of logical data at 15x compression.

Why ZFS is Faster

  1. Compression in transit: ZFS raw send (-w) transfers compressed blocks. With zstd-3, 10GB logical = 657MB actual transfer. Litestream transfers uncompressed SQLite data.

  2. Block-level deduplication: ZFS incrementals only transfer changed blocks. Litestream WAL segments contain logical operations that must be replayed.

  3. No replay overhead: ZFS receive writes blocks directly to disk. Litestream must decompress and replay WAL operations against SQLite.

  4. Native encryption: ZFS encrypted blocks transfer as-is. No decrypt/re-encrypt overhead.

Litestream Advantages

  1. Simpler infrastructure: No ZFS pool management, works with any filesystem
  2. Point-in-time recovery: Can restore to any WAL position, not just snapshots
  3. Smaller storage footprint: WAL compression can be more efficient for certain workloads
  4. Cross-platform: Works anywhere SQLite runs

Caveats

  • Different instance size: ZFS tested on c3-standard-22; Litestream best results on c3-standard-88
  • Same disk type: Both tested on pd-balanced (ZFS via NVMe interface)
  • Different test data: ZFS used synthetic JSON; Litestream used real SQLite databases
  • Hyperdisk potential: Litestream saw 35-40% improvement with Hyperdisk; ZFS would likely see similar gains

Recommendation

For ClusterDuck's use case (fast failover, encrypted data at rest):

  • ZFS wins decisively on restore speed due to compressed/encrypted block transfer
  • 10GB space restores in ~9 seconds vs ~100+ seconds with Litestream

Test Environment

  • Instance: c3-standard-22 (22 vCPUs, 88GB RAM, 23 Gbps network)
  • Storage: pd-balanced 500GB (via NVMe interface)
  • Encryption: AES-256-GCM (native ZFS encryption)
  • Compression: zstd-3
  • GCS: Standard storage, same region (us-west1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment