This benchmark reproduces the scenario discussed in chat:
- 4 Puma workers
- open-model traffic at 12 req/s (k6 constant-arrival-rate)
- synthetic Rack app with ~250ms service time
- service alternates CPU and IO slices randomly every ~10ms
- compare
threads=5vsthreads=25 - during traffic: kill one worker, wait for restart, let it serve ~5s, then kill a different worker (round-robin)