local development used to be human-paced: one developer, one editor, occasional builds and tests. consumer ssd endurance assumptions were built around that pattern.
agent-driven development changes the load profile. when you run 4-32 local agents in parallel, each doing build, test, validation, and coding loops, write pressure scales horizontally just like cpu and memory demand.
graph LR;
a["agent count"] --> e["daily host writes"];
b["cycles per agent per day"] --> e;
c["logical writes per cycle"] --> e;
d["host overhead factor"] --> e;
e --> f["annual host writes"];
f --> g["TBW budget check"];
g -->|over| h["move hot paths to ramdisk"];
g -->|within| i["safe headroom to scale"];
use a simple model:
daily_host_writes_GB = agents * cycles_per_agent_per_day * logical_writes_per_cycle_GB * host_overhead_factor
optional (for nand-wear reasoning, not TBW comparison):
daily_nand_writes_GB = daily_host_writes_GB * device_wa_factor
for continuous operation, define:
cycles_per_agent_per_day = cycles_per_hour * 24
typical write ranges per cycle in real dev loops:
| task | write range per cycle |
|---|---|
| build artifacts and object files | 1-6 GB |
| test temp files and coverage/log output | 0.2-2 GB |
| ml numerical correctness validation (intermediate tensors, traces, eval output) | 1-10 GB |
| coding overhead (indexing, logs, git/object churn) | 0.05-0.3 GB |
moderate orchestrated setup:
agents = 8cycles_per_agent_per_day = 20logical_writes_per_cycle_GB = 5host_overhead_factor = 1.3(filesystem metadata, journaling, copy-on-write overhead)
result:
daily_host_writes_GB = 8 * 20 * 5 * 1.3 = 1040 GB/dayannual_host_writes_TB ~= 380 TB/year
compare that with endurance. if a 1 TB drive is rated at 600 TBW, a 10-year budget is:
600 TB / 3650 days ~= 0.164 TB/day ~= 164 GB/day
TBW means terabytes written: the total cumulative host writes the drive is rated to absorb over its warranted endurance life.
TBW is a warranty/rating figure, not a hard failure cliff. real endurance varies with workload and operating conditions, but TBW is still a useful planning budget.
consumer warranties are often much shorter than 10 years; the 10-year view is a planning horizon, not a warranty promise.
to make the continuous effect concrete, assume:
- each agent runs
2 cycles/hourcontinuously - each cycle writes
2.5 GB host_overhead_factor = 1.3- drive endurance is
600 TBW(common consumer class)
then:
cycles_per_agent_per_day = 2 * 24 = 48daily_host_writes_per_agent_GB = 48 * 2.5 * 1.3 = 156 GB/daydaily_host_writes_GB = agents * daily_host_writes_per_agent_GBssd_lifetime_years = (drive_TBW * 1000) / (daily_host_writes_GB * 365)
note: calculations here use decimal storage units (the same convention used by drive vendors): 1 TB = 1000 GB.
when comparing to TBW, use host writes. do not multiply by device_wa_factor for the TBW check.
| agents (24/7) | daily writes | annual writes | lifetime for 600 TBW ssd |
|---|---|---|---|
| 1 | 156 GB/day | 56.9 TB/year | 10.5 years |
| 4 | 624 GB/day | 227.8 TB/year | 2.6 years |
| 8 | 1248 GB/day | 455.5 TB/year | 1.3 years |
| 16 | 2496 GB/day | 911.0 TB/year | 0.66 years (~8 months) |
the key point: under a fixed per-agent workload, lifetime scales inversely with agent count (if agents double, expected lifetime roughly halves). contention, cache behavior, and io throttling can bend real-world results above or below this baseline.
ramdisk (tmpfs) shifts high-churn ephemeral writes into dram:
- near-zero ssd wear for throwaway artifacts
- lower latency for compile/test loops
- less io queue contention under concurrent agents
this is no longer a micro-optimization. it is a durability and throughput control. tmpfs is volatile and memory-backed, so size it conservatively to avoid swap pressure that can reintroduce ssd writes.
both ubuntu and macos already use an implicit in-memory file cache (page cache / unified buffer cache). this helps performance, but it is not the same as an explicit ramdisk mount.
implicit (os-managed cache):
- reads and writes are often served from ram first
- dirty pages are often flushed later (writeback, journal checkpoints,
fsync), especially for longer-lived files - eviction is global and workload-agnostic under memory pressure
- files deleted before writeback may avoid full data flush, but metadata/journal traffic still tends to persist
- sustained high-churn paths can still produce substantial long-term
TBWconsumption
explicit (tmpfs or mounted ramdisk):
- writes to that mount are memory-backed by design
- per-path control is deterministic (
TMPDIR,CARGO_TARGET_DIR,SCCACHE_DIR) - easy per-agent isolation (
/mnt/ramdisk/agent-<id>) - hard quotas via per-agent mounts (
tmpfs -o size=...) or cgroup/systemd memory limits - predictable cleanup by unmount/delete
why explicit ramdisk with sccache:
- page cache can mask latency, but it does not guarantee write avoidance on persistent filesystems
- the biggest write sink is often build output materialization (
CARGO_TARGET_DIR) even on cache hits - with many continuous agents, local
sccachemetadata/object churn can still create steady writeback pressure - setting
SCCACHE_DIRto explicit ramdisk makes local cache-write avoidance deterministic for hot entries
practical guidance:
- if reboot persistence matters most, keep
SCCACHE_DIRon ssd (or remote backend), and put build/test scratch on ramdisk - if minimizing local ssd wear matters most, place
SCCACHE_DIRon ramdisk with a bounded size (SCCACHE_CACHE_SIZE) and accept cache loss on reboot - best of both: keep a hot local ramdisk tier and use a remote
sccachebackend for durability/sharing
graph TB;
subgraph p[persistent ssd tier]
s1["source repos"];
s2["dependency cache kept across reboots"];
s3["final artifacts"];
end
subgraph r[ramdisk tmpfs tier]
r1["build scratch"];
r2["test temp and coverage scratch"];
r3["ml intermediates and throwaway checkpoints"];
r4["agent temp and logs"];
end
a1["agent 1"] --> r1;
a2["agent 2"] --> r2;
r1 --> s3;
r2 --> s3;
keep persistent/reproducible state on ssd:
- source repos
- dependency caches you want to keep
- final artifacts
move high-churn ephemeral paths to ramdisk:
- build scratch (
target,dist, temp object dirs) - test temp + coverage scratch
- ml validation intermediates
- agent-local temp/log workdirs
minimal setup pattern:
AGENT_ID=${AGENT_ID:-0}
sudo mkdir -p /mnt/ramdisk
sudo mount -t tmpfs -o size=24G,mode=1777 tmpfs /mnt/ramdisk
mkdir -p /mnt/ramdisk/agent-$AGENT_ID/{tmp,target,sccache}
export TMPDIR=/mnt/ramdisk/agent-$AGENT_ID/tmp
export CARGO_TARGET_DIR=/mnt/ramdisk/agent-$AGENT_ID/target
# optional:
export SCCACHE_DIR=/mnt/ramdisk/agent-$AGENT_ID/sccache
# export SCCACHE_CACHE_SIZE=10Gfor multiple agents, set a unique AGENT_ID per worker to prevent cross-agent contention.
note: some systems already mount /tmp as tmpfs. a dedicated mount is still useful for deterministic sizing and per-agent isolation.
validate the model with host-write counters during a normal agent run:
- nvme: compare
data_units_writtenbefore/after (nvme smart-log) - smart: compare host-write attributes before/after (
smartctl -A)
using counter deltas helps calibrate host_overhead_factor for your actual workload.
before increasing local agent count, compute projected writes and compare with a 10-year daily budget:
daily_budget_GB ~= (drive_TBW * 1000) / 3650
if projected writes are already a large fraction of that budget, add ramdisk first and then scale agents.
in agent-native development, horizontal scaling without storage tiering is a hidden reliability bug.