willkelly/zfs-gcs-benchmark-results.md

## zfs-gcs-benchmark-results.md

      
    Raw
  

              zfs-gcs-benchmark-results.md
            
          
    ZFS GCS Replication Benchmark Results

Date: January 6, 2025

Worker: clusterduck-worker-spot-stage-whz7 (c3-standard-22)

Region: us-west1-b

GCS Bucket: clusterduck-zfs-snapshots-stage (same region)

ZFS Pool: spaces (encrypted AES-256-GCM, compression=zstd-3)
Key Findings

1. Raw Send (-w) is Critical

ZFS send has two modes:

Standard send: Decompresses data before sending, recompresses on receive
Raw send (-w): Sends compressed+encrypted blocks directly

Always use -w flag for encrypted/compressed pools!
2. zstd-3 vs lz4 Compression


Logical Size
lz4 Transfer
zstd-3 Transfer
Improvement


100 MB
19 MB
6.6 MB
2.9x smaller


500 MB
95 MB
33 MB
2.9x smaller


2 GB
380 MB
131 MB
2.9x smaller


10 GB
1.9 GB
657 MB
2.9x smaller


zstd-3 achieves ~15x compression on JSON data (vs ~5x for lz4).
Test Scenarios


Scenario
Base Size
Incrementals
Compression
Actual Transfer


small
100 MB
5 × 1 MB appends
~15x
6.6 MB base, 70 KB/incr


medium
500 MB
5 × 5 MB appends
~15x
33 MB base, 340 KB/incr


large
2 GB
5 × 10 MB appends
~15x
131 MB base, 670 KB/incr


xlarge
10 GB
5 × 50 MB appends
~15x
657 MB base, 3.3 MB/incr


Results Summary (zstd-3)

Base Snapshot Restore Times


Logical Size
Compressed Size
Send Time
Recv Time
Send MB/s
Recv MB/s


100 MB
6.6 MB
269 ms
203 ms
39
33


500 MB
33 MB
400 ms
534 ms
83
62


2 GB
131 MB
1.2 sec
1.8 sec
110
71


10 GB
657 MB
6.0 sec
8.5 sec
111
77


Incremental Snapshot Performance


Logical Append
Compressed Size
Send Time
Recv Time


1 MB
70 KB
55 ms
93 ms


5 MB
340 KB
55 ms
97 ms


10 MB
670 KB
60 ms
97 ms


50 MB
3.3 MB
103 ms
127 ms


Estimated Full Restore Times

For a complete restore (base + 5 incrementals):


Space Size (Logical)
Base Recv
5 Incrementals
Total Restore


100 MB
0.2 sec
0.5 sec
< 1 sec


500 MB
0.5 sec
0.5 sec
~1 sec


2 GB
1.8 sec
0.5 sec
~2.3 sec


10 GB
8.5 sec
0.6 sec
~9 sec


50 GB (estimated)
~43 sec
~3 sec
~46 sec


Key Observations

1. Compression is Massive

With zstd-3, JSON data compresses ~15x:

10GB logical = 657MB actual transfer
This is 2.9x better than lz4

2. Incrementals are Extremely Fast

With append-style writes (like SQLite WAL):

1MB logical append = 70KB transfer = 93ms restore
50MB logical append = 3.3MB transfer = 127ms restore

3. Latency Floor ~50-90ms

Even tiny incrementals take 50-90ms due to:

GCS API round-trip (~30-50ms)
ZFS snapshot overhead (~10-20ms)
Process spawn overhead

4. Throughput Characteristics


Send: 80-110 MB/s of compressed data
Recv: 60-77 MB/s of compressed data (disk write bound)

Methodology

Test Data Generation

The benchmark generates realistic JSON data to simulate SQLite/document workloads:
{"id":1736198400001,"ts":1736198400001,"uid":"u1234","sid":"s56789","act":"click","pg":"/dashboard","dur":1500,"ok":true,"tg":["web","mobile"],"ref":"r123","v":"1.0"}
Each record contains:

Timestamps and sequential IDs (high entropy)
User/session identifiers (medium entropy, repeated patterns)
Action types and page paths (low entropy, from fixed sets)
Boolean flags and arrays (mixed)

This produces ~15x compression with zstd-3, which is better than typical production workloads (expect 5-10x for real SQLite data).
Incremental Simulation

Incrementals append data to the file (like SQLite WAL writes) rather than overwriting. This means ZFS only needs to transfer the new blocks, resulting in small incremental snapshots proportional to the append size.
Raw Send Mode

All benchmarks use zfs send -w (raw mode) which:

Preserves on-disk compression during transfer
Preserves encryption (no key needed on receiving side until mount)
Transfers actual block data, not logical data

Production Recommendations


Use zstd-3 compression: ~3x better than lz4 for JSON/SQLite data
Use raw sends (-w): Required for compressed/encrypted pools
Frequent small incrementals: 1-10MB appends restore in <100ms
Pre-warm for large spaces: 10GB+ takes ~10 seconds to restore

Comparison: ZFS vs Litestream

We previously benchmarked Litestream GCS recovery on similar hardware. Here's how they compare:
Recovery Time Comparison


Database Size
Litestream (c3-88, Hyperdisk)
ZFS (c3-22, Local NVMe)
ZFS Advantage


100 MB
1.4 sec
0.2 sec
7x faster


500 MB
5.6 sec
0.5 sec
11x faster


1 GB
11.0 sec
~1.0 sec (est)
11x faster


2 GB
21.6 sec
1.8 sec
12x faster


5 GB
51.6 sec
~4.5 sec (est)
11x faster


10 GB
~103 sec (est)
8.5 sec
12x faster


Throughput Comparison


Metric
Litestream
ZFS (zstd-3)


Peak throughput
90-99 MB/s
77-111 MB/s


Effective throughput (logical data)
90-99 MB/s
~1,150 MB/s*


*ZFS transfers compressed data, so 77 MB/s of compressed data = ~1,150 MB/s of logical data at 15x compression.
Why ZFS is Faster


Compression in transit: ZFS raw send (-w) transfers compressed blocks. With zstd-3, 10GB logical = 657MB actual transfer. Litestream transfers uncompressed SQLite data.


Block-level deduplication: ZFS incrementals only transfer changed blocks. Litestream WAL segments contain logical operations that must be replayed.


No replay overhead: ZFS receive writes blocks directly to disk. Litestream must decompress and replay WAL operations against SQLite.


Native encryption: ZFS encrypted blocks transfer as-is. No decrypt/re-encrypt overhead.


Litestream Advantages


Simpler infrastructure: No ZFS pool management, works with any filesystem
Point-in-time recovery: Can restore to any WAL position, not just snapshots
Smaller storage footprint: WAL compression can be more efficient for certain workloads
Cross-platform: Works anywhere SQLite runs

Caveats


Different instance size: ZFS tested on c3-standard-22; Litestream best results on c3-standard-88
Same disk type: Both tested on pd-balanced (ZFS via NVMe interface)
Different test data: ZFS used synthetic JSON; Litestream used real SQLite databases
Hyperdisk potential: Litestream saw 35-40% improvement with Hyperdisk; ZFS would likely see similar gains

Recommendation

For ClusterDuck's use case (fast failover, encrypted data at rest):

ZFS wins decisively on restore speed due to compressed/encrypted block transfer
10GB space restores in ~9 seconds vs ~100+ seconds with Litestream

Test Environment


Instance: c3-standard-22 (22 vCPUs, 88GB RAM, 23 Gbps network)
Storage: pd-balanced 500GB (via NVMe interface)
Encryption: AES-256-GCM (native ZFS encryption)
Compression: zstd-3
GCS: Standard storage, same region (us-west1)
Logical Size	lz4 Transfer	zstd-3 Transfer	Improvement
100 MB	19 MB	6.6 MB	2.9x smaller
500 MB	95 MB	33 MB	2.9x smaller
2 GB	380 MB	131 MB	2.9x smaller
10 GB	1.9 GB	657 MB	2.9x smaller
Scenario	Base Size	Incrementals	Compression	Actual Transfer
small	100 MB	5 × 1 MB appends	~15x	6.6 MB base, 70 KB/incr
medium	500 MB	5 × 5 MB appends	~15x	33 MB base, 340 KB/incr
large	2 GB	5 × 10 MB appends	~15x	131 MB base, 670 KB/incr
xlarge	10 GB	5 × 50 MB appends	~15x	657 MB base, 3.3 MB/incr
Logical Size	Compressed Size	Send Time	Recv Time	Send MB/s	Recv MB/s
100 MB	6.6 MB	269 ms	203 ms	39	33
500 MB	33 MB	400 ms	534 ms	83	62
2 GB	131 MB	1.2 sec	1.8 sec	110	71
10 GB	657 MB	6.0 sec	8.5 sec	111	77
Logical Append	Compressed Size	Send Time	Recv Time
1 MB	70 KB	55 ms	93 ms
5 MB	340 KB	55 ms	97 ms
10 MB	670 KB	60 ms	97 ms
50 MB	3.3 MB	103 ms	127 ms
Space Size (Logical)	Base Recv	5 Incrementals	Total Restore
100 MB	0.2 sec	0.5 sec	< 1 sec
500 MB	0.5 sec	0.5 sec	~1 sec
2 GB	1.8 sec	0.5 sec	~2.3 sec
10 GB	8.5 sec	0.6 sec	~9 sec
50 GB (estimated)	~43 sec	~3 sec	~46 sec
Database Size	Litestream (c3-88, Hyperdisk)	ZFS (c3-22, Local NVMe)	ZFS Advantage
100 MB	1.4 sec	0.2 sec	7x faster
500 MB	5.6 sec	0.5 sec	11x faster
1 GB	11.0 sec	~1.0 sec (est)	11x faster
2 GB	21.6 sec	1.8 sec	12x faster
5 GB	51.6 sec	~4.5 sec (est)	11x faster
10 GB	~103 sec (est)	8.5 sec	12x faster
Metric	Litestream	ZFS (zstd-3)
Peak throughput	90-99 MB/s	77-111 MB/s
Effective throughput (logical data)	90-99 MB/s	~1,150 MB/s*