- Infrastructure: r5d.xlarge (4 vCPU, 32 GB RAM, 150 GB NVMe) in us-west-2
- OpenData: commit
f894c52onmain, writing to SlateDB → S3 with NVMe read cache - VictoriaMetrics: v1.102.1, writing to local NVMe
- Benchmark tool: vm-prometheus-benchmark (aion-eval branch), scrape interval 10s
- Durability: OpenData writes wait for
Durability::Applied(in-memory delta); background flush to S3 WAL every 100ms
| Targets | Samples/sec | Unique Series | OD p50 | OD p99 | VM p50 | VM p99 | Drops |
|---|---|---|---|---|---|---|---|
| 100 | ~7k | ~64k | 13ms | 303ms | 2.9ms | 20ms | None |
| 500 | ~33k | ~322k | 90ms | 2,024ms | 11ms | 28ms | None |
| 1,000 | ~65k | ~643k | 402ms | 3,696ms | 11ms | 36ms | None |
| 2,000 | ~129k | ~1.3M | 909ms | 5,180ms | 10ms | 444ms | None |
| 4,000 | ~257k | ~2.6M | 957ms | 3,017ms | 11ms | 34ms | None |
| 8,000 | ~514k | ~5.1M | 1,309ms | 5,533ms | 15ms | 201ms | None |
| Elapsed | Ingestion Rate | p50 | p99 | Dropped | Errors |
|---|---|---|---|---|---|
| 5 min | 514,415 | 986ms | 4,273ms | 0 | 0 |
| 10 min | 514,456 | 992ms | 5,301ms | 0 | 0 |
| 15 min | 514,447 | 981ms | 6,757ms | 0 | 0 |
| 20 min | 514,511 | 992ms | 5,327ms | 0 | 0 |
| 25 min | 514,514 | 949ms | 6,640ms | 0 | 0 |
| 30 min | 514,507 | 1,130ms | 9,864ms | 0 | 0 |
| 35 min | 514,476 | 1,171ms | 6,400ms | 0 | 0 |
| 40 min | 514,471 | 959ms | 3,576ms | 0 | 0 |
| 45 min | 514,496 | 1,015ms | 3,284ms | 0 | 0 |
| 50 min | 514,443 | 1,010ms | 3,924ms | 0 | 0 |
| 55 min | 514,492 | 977ms | 4,741ms | 0 | 0 |
| 60 min | 514,427 | 996ms | 7,916ms | 0 | 0 |
- Throughput: Both OpenData and VictoriaMetrics handle up to 514k samples/sec on a 4-vCPU instance with zero drops at all load levels.
- Latency gap: OpenData write latency is 50-90x higher than VictoriaMetrics at mid-range loads, despite not blocking on S3 durability. This needs investigation, but it seems that the bottleneck is in the write coordinator / in-memory path, not S3 round-trips.
- Soak stability: Over 1 hour at 514k samples/sec, OpenData shows no throughput degradation and no latency creep. p50 stays around 950-1,170ms; p99 is spiky (3.3s-9.9s) likely correlating with SlateDB compaction cycles.
- CPU profile: The process is I/O-bound, not CPU-bound — flamegraph shows mostly futex waits and tokio runtime idle time.
- Storage: OpenData wrote ~134 GB to S3 across all test runs. VictoriaMetrics wrote to local NVMe only.