alexfazio/serverless-claude-code-platform-comparison.md

## serverless-claude-code-platform-comparison.md

      
    Raw
  

              serverless-claude-code-platform-comparison.md
            
          
    Serverless Claude Code Instances with Persistent Compute

1. Problem

We need a way to give each Discord channel its own persistent "computer" running
Claude Code CLI, where:

The computer survives across sessions (disk persists)
It wakes up fast when a user sends a message (< 10s, ideally < 5s)
It costs near-zero when idle
It scales to 50+ independent instances without orchestration overhead
A developer can manage instances via CLI (create, SSH, resize, destroy)

2. Requirements (Prioritized)

P0 -- Dealbreakers


Requirement
Detail


GA-quality persistent disk
Files must survive across compute restarts. Storage must be Generally Available (not beta/preview) with documented data durability guarantees (no known data loss bugs, snapshot/backup capability).


SSH-compatible remote access
Real SSH or equivalent supporting: interactive shell, file transfer (SCP/SFTP), port forwarding, and key-based authentication. Exec-only shells and proprietary terminals without file transfer do not qualify.


Cold start < 10s
Resume from idle/hibernation must complete within 10 seconds.


Full internet
Instances need unrestricted outbound access (git, npm, pip, APIs).


Zero idle compute cost
Compute billing must stop completely when instances are stopped/suspended. Storage-only charges are acceptable.


No hard boot timeout < 30s
Platform must not impose a hard boot timeout under 30 seconds that would silently reject slow-starting containers with errors (e.g., 502s).


P1 -- High Priority (ranked)


Long-running process support -- Instances must support indefinite processes (e.g., a Discord bot running inside the instance), not just request-response or time-limited sandboxes
No platform-imposed lifetime limit -- Platform must not impose a maximum sandbox/container lifetime that forces instance recreation (e.g., 24hr sandbox expiry, 30-day max lifetime)
Cost (Phase 1) -- < $50/month for 5 workers at ~2-4 hrs active/day each (+ bot)
Cost (Phase 2-3) -- < $100/month for 5 workers at ~8 hrs active/day each (+ bot)
Per-second billing -- Pay only for active compute time
Cold start speed -- Faster is better; < 5s is ideal
Developer experience -- Python/TS SDK, CLI tools, minimal boilerplate
Scale-to-zero with safety net -- Platform must support zero compute cost when idle via programmatic stop API. Platform-native auto-stop as a failsafe (preventing orphaned running instances if bot orchestration fails) is preferred but not required
Keep-warm option -- Platform must support keeping instances warm (e.g., min_containers, scaledown_window) for latency-sensitive phases where cold start must be near-zero

P2 -- Nice to Have


Configurable idle timeout (grace period before hibernate)
Snapshot/restore of full VM state (memory + disk)
Central config template with per-instance overrides
Inter-instance communication (future)
Growable disk per instance

3. Claude Code CLI Requirements

Based on research of Claude Code's system requirements:


Resource
Minimum
Recommended


CPU
2 vCPU
4 vCPU


RAM
8 GB
8 GB


Disk (base install)
~120 MB
~500 MB (with tools)


Disk (working directory)
Variable
5 GB


Runtime
None (native binary bundles Bun)
git, bash, curl, jq


Network
api.z.ai (z.ai GLM API endpoint)
Full internet


Auth
ANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL env vars (z.ai)
Same


Why 8 GB RAM: Claude Code's Node.js process idles at 400-700 MB but
grows to 1-4 GB during active prompts. Critically, all subprocess stdout/stderr
is retained permanently in the JavaScript heap (GitHub issue #11155, closed
NOT_PLANNED). Running git, npm install, pytest, or other subprocesses
causes linear memory growth. Single sessions have been observed reaching 3.3 GB.
With subprocesses (Python, TypeScript, Bash) needing additional memory on top,
4 GB causes frequent OOMs under real workloads. 8 GB provides a safety margin.
Recommended env vars for memory management:
# Cap Node.js old-generation heap (Anthropic's own devcontainer uses 4096)
export NODE_OPTIONS="--max-old-space-size=4096"
# Reduce glibc arena count to limit virtual memory fragmentation
export MALLOC_ARENA_MAX=2
Minimal Container Setup (Dockerfile)

FROM ubuntu:22.04

# System dependencies
RUN apt-get update && apt-get install -y git bash curl jq && \
    rm -rf /var/lib/apt/lists/*

# --- Auto-Update Defense-in-Depth ---
#
# DISABLE_AUTOUPDATER has documented reliability issues:
#   - Issue #2898: config key "clobbered under some circumstances"
#   - Issue #9327: auto-update deleted the binary mid-session
#   - Issue #14985 (OPEN, Dec 2025): env var in settings.json not honored
# We use 4 layers of defense:

# Tier 1: Real shell env vars (most reliable -- not settings.json)
ENV DISABLE_AUTOUPDATER=1
ENV CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

# Tier 2: Pin via stable release channel at build time
# Note: `-s` is documented for channel names (e.g., `stable`), not arbitrary
# version strings. The stable channel lags ~1 week behind latest.
RUN curl -fsSL https://claude.ai/install.sh | bash -s stable

# Tier 3: Make binary read-only (update cannot overwrite)
RUN chmod -R 555 ~/.local/share/claude/versions/ && \
    chmod 555 ~/.local/bin/claude

# Tier 4: Ephemeral rootfs (natural Docker defense -- changes lost on restart)

# --- Memory Management ---
ENV NODE_OPTIONS="--max-old-space-size=4096"
ENV MALLOC_ARENA_MAX=2

# --- z.ai GLM Model Provider ---
ENV ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
ENV ANTHROPIC_API_KEY=""
ENV ANTHROPIC_MODEL="glm-4.7"
ENV ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4.7"
ENV ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
ENV ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
ENV API_TIMEOUT_MS="3000000"
# ANTHROPIC_AUTH_TOKEN set via fly secrets (not baked into image)
# Headless mode with streaming
claude -p "your prompt" \
  --output-format stream-json \
  --allowed-tools "Bash,Read,Edit,Write" \
  --dangerously-skip-permissions
Note on CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1: This umbrella
variable also disables telemetry (Statsig), error reporting (Sentry), and
the /bug command. With z.ai as provider, Anthropic telemetry has no value.
This reduces outbound traffic, startup latency, and potential failure modes.
Sprites Environment Setup (No Dockerfile Needed)

Sprites come pre-configured with Ubuntu (docs say 24.04 LTS; community
evidence suggests actual image may be 25.04 — verify during prototyping),
Claude Code, Node.js, Python, Go, and common dev tools. No Dockerfile is
needed. Environment customization is done via exec commands and persists
across sleep/wake cycles.
# One-time setup (run inside sprite via exec or console)
# Configure z.ai GLM models
cat >> ~/.bashrc << 'EOF'
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
export API_TIMEOUT_MS="3000000"
export NODE_OPTIONS="--max-old-space-size=4096"
export MALLOC_ARENA_MAX=2
export DISABLE_AUTOUPDATER=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
EOF

# Install additional tools if needed
apt-get update && apt-get install -y jq

# Create checkpoint of configured environment
# (via API: POST /v1/sprites/{name}/checkpoint)
Key differences from Dockerfile approach:

No 4-layer auto-update defense needed -- Claude Code is pre-installed and
the sprite's persistent filesystem means manual version pinning is simpler
(just don't run the updater). DISABLE_AUTOUPDATER=1 in .bashrc is the
primary defense. Filesystem is persistent (not ephemeral rootfs), so
read-only binary permissions (chmod 555) are still useful as a backup.
Environment variables must be written to .bashrc or .profile to persist
across sleep/wake cycles (env vars set via exec env param are per-session).
Checkpoints replace container image versioning -- create a checkpoint after
configuring the environment, restore to it if needed.
ANTHROPIC_AUTH_TOKEN should be set via the exec env parameter at
dispatch time (not baked into .bashrc) for security.

Streaming Integration

Claude Code supports --output-format stream-json which emits
newline-delimited JSON events. This enables real-time streaming to Discord
by filtering for text_delta events and forwarding them as message edits.
Model Provider: z.ai GLM

Instances use z.ai GLM models via the Anthropic-compatible API endpoint
(api.z.ai/api/anthropic). Claude Code's interface displays Claude model
names, but GLM models execute behind the scenes. This is transparent to
end users -- Discord responses show no model branding.


Claude Code Slot
GLM Model
Role


Opus
glm-4.7
Complex reasoning, planning


Sonnet
glm-4.7
Main model (balanced)


Haiku
glm-4.5-air
Background tasks (summaries, quick searches)


z.ai dependency: z.ai is the sole model provider. If the z.ai API is
unavailable, instances cannot process prompts. This is an accepted risk --
no fallback provider is configured. The bot-to-instance interface (SSH + CLI)
remains platform-agnostic, so switching providers requires only changing 4
env vars (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_MODEL,
ANTHROPIC_DEFAULT_*_MODEL).
GLM-4.7-Flash evaluation (glm-4.7-flash): Evaluated and rejected for
the Haiku slot. Despite being free with 200K context (30B-A3B MoE), research
found three categories of unresolved tool calling bugs: (1) XML tag corruption
with duplicated/garbled <tool_call> tags (SGLang #15721), (2) XML syntax
mismatches where closing tags don't match opening tags causing context loss
(Ollama #13820), (3) object parameter serialization failures where the model
outputs JSON strings instead of objects (HuggingFace #18). No Z.AI-side fixes
published as of February 2026. Z.AI's own Claude Code model mapping still
recommends GLM-4.5-Air for haiku. GLM-4.5-Air has 12B active parameters (4x
Flash's 3B), a 0.940 Tool Selection Quality score on Galileo's Agent
Leaderboard, and months of proven Claude Code stability. Re-evaluate if Z.AI
publishes tool calling fixes or updates the model mapping.
Version Management

Claude Code binary versions are pinned in the Dockerfile using the
CLAUDE_CODE_VERSION build arg (see Dockerfile above). Auto-updates are
disabled via a 4-layer defense-in-depth strategy (see Minimal Container Setup).
Update strategy (rolling):

Build new container image with updated CLAUDE_CODE_VERSION arg
Deploy to one instance first, monitor for 24 hours
If stable, deploy to remaining instances via fly deploy
At 20+ instances, formalize into canary deployment with automated rollback

The stable release channel (bash -s stable) lags ~1 week behind latest
and is a safer choice for production. Check current version:
claude -v.
Auto-updater reliability note: The DISABLE_AUTOUPDATER=1 env var has a
documented history of being bypassed (see Dockerfile comments for issue
references). The 4-layer defense ensures that even if the env var is ignored,
the read-only binary + ephemeral rootfs prevent any update from persisting.
The version check HTTP request may still fire on startup (adding ~1s latency),
but the actual binary replacement is blocked by filesystem permissions.
4. System Architecture

Note: This section describes the Fly.io Machines architecture. The Sprites
alternative simplifies significantly -- see the "Sprites vs Fly.io Machines:
Architecture Simplification" comparison table in Section 5 and the Sprites
prototype track in Section 9.
Separate Fly Apps (Machines Track)

The bot and workers run as separate Fly apps because they have
fundamentally different lifecycle needs:

Bot: Must be always-on (listening to Discord 24/7). Uses 256 MB RAM.
Never goes idle. Cost: ~$2/month fixed.
Workers: Should only run when a user requests Claude Code. Uses 8 GB
RAM. Should stop when idle. Cost: $0/month when stopped.

If both ran on the same Machine, the Machine could never stop (bot is always
active), paying for 8 GB RAM 24/7 regardless of Claude Code usage.
┌─────────────────────────────┐     ┌──────────────────────────────────┐
│  discord-bot-app             │     │  worker-app                      │
│  (Always-on)                 │     │  (Scale-to-zero)                 │
│                              │     │                                  │
│  shared-cpu-1x / 256 MB      │────>│  performance-1x / 8 GB           │
│  ~$2/month fixed             │     │  $0/month when stopped           │
│                              │     │  ~$5.85/hr when running          │
│  Receives Discord messages   │     │  Runs Claude Code sessions       │
│  Routes to workers           │     │  Has persistent volume (5 GB)    │
│  No volume needed            │     │  Has HTTP health endpoint        │
│  No [[services]] section     │     │  fly.toml [[services]] for proxy │
└─────────────────────────────┘     └──────────────────────────────────┘
         │                                          ▲
         │     Fly internal DNS                     │
         └────> worker-app.internal:8080 ───────────┘

Communication: Fly provides .internal DNS between apps in the same
organization. The bot reaches workers at worker-app.internal:8080 without
exposing anything to the public internet.
Instance Lifecycle

Discord message arrives
  -> Bot looks up Machine ID for channel
  -> If Machine is stopped:
       -> Wake via Fly Machines API (POST /start) -- ~2-6s
       -> Or: HTTP request to worker-app.internal triggers auto-start
  -> Pipe message to `claude -p` with stream-json
  -> Stream response back to Discord
  -> Bot resets idle timer for this Machine

Shutdown Configuration (fly.toml -- worker-app)

kill_signal = "SIGTERM"
kill_timeout = 30
Rationale: Fly.io sends SIGINT by default with a 5-minute default grace
period (configurable via kill_timeout, max 300s). Claude Code has no
robust graceful shutdown handler -- it simply dies on signal. The 30-second
kill_timeout override provides margin for: ext4 journal flush (~1-5s if
unclean shutdown), child process cleanup (Python/Node/git ~1-2s), and
filesystem sync — without waiting the full 5-minute default. The primary
defense against data loss is the idle detection process check -- if Claude
Code isn't actively running, there's nothing to corrupt. ext4 ordered
journaling guarantees either old content or new content, never a mix.
User notification: When a session is stopped (hard cap, idle timeout,
or deploy), the bot sends a Discord message: "Your Claude Code session
was stopped [reason]. Your work is saved on disk. Use /start to resume."
Idle Detection (Hybrid, 3-Layer)

Idle detection uses a three-layer approach to handle long-running prompts
(z.ai GLM API timeout is 50 minutes) without prematurely stopping instances
or leaving orphaned instances running indefinitely.
Layer 1: Process-based (primary)
  Bot tracks "last prompt dispatched" per Machine.
  When no active prompt for 5 min (configurable):
    -> HTTP GET worker-app.internal:8080/health
    -> If 200 (active): extend timer 5 min, retry
    -> If 503 (idle): fly machine stop
    -> If unreachable: assume stuck, fly machine stop

Layer 2: Hard cap (safety)
  4 hours from session start -> force stop regardless (configurable
  per-instance, default 4 hr). Comparable to GitHub Codespaces max (4 hr).
  Higher than 50-min API timeout to avoid killing legitimate work.

Layer 3: Fly.io auto-stop (failsafe)
  worker-app: auto_stop_machines = "stop" (proxy sees health endpoint)
  discord-bot-app: auto_stop_machines = "off" (no proxy-routed services)
  Catches cases where the Discord bot itself crashes or loses
  track of a Machine. The proxy checks every ~5 min for zero connections.

Health check endpoint (10-line HTTP server inside worker container):
Returns 200 OK when Claude Code process is active, 503 Service Unavailable when idle. This serves double duty: (1) Layer 1 uses it for
process detection instead of SSH/pgrep, (2) Layer 3 uses it for Fly.io
proxy-based auto-stop detection. No additional API tokens needed inside
the worker.
Why the bot manages idle detection (not a container sidecar):

The bot already knows when prompts are dispatched and completed
Centralized timers avoid managing 5-50 sidecar processes
The bot controls fly machine start/stop -- co-locating the timer
eliminates race conditions between detection and lifecycle actions
Fly.io auto-stop provides defense-in-depth without extra code

Key architectural insight: For the Discord bot maintaining a persistent
WebSocket (not routed through Fly Proxy), Fly.io's auto-stop is effectively
blind to the bot's activity state. The bot app MUST use
auto_stop_machines = "off". For workers, the HTTP health endpoint makes
activity visible to the proxy.
Configuration Management


Base template: Shared across all instances (CLAUDE.md, z.ai model
config env vars, base tools, ANTHROPIC_AUTH_TOKEN)
Per-instance overrides: Channel-specific config (git repos, additional
tools, custom CLAUDE.md sections, hard session cap override)
Config stored in: Git repo or config file, applied at instance
creation/boot
Secrets: ANTHROPIC_AUTH_TOKEN (z.ai API key) injected via Fly.io
secrets (fly secrets set), not baked into the container image

Session Management

Claude Code stores session history as JSONL files under
~/.claude/projects/{project-id}/{session-id}.jsonl. Key behaviors:

Session ID mapping: Use --resume "$session_id" with deterministic
UUID5 per Discord channel. This maps each channel to a consistent
conversation history. --continue resumes the last session, --resume
picks a specific one. Note: there is no --session-id flag — the session
ID is obtained from JSON output (session_id field) and passed to
--resume.
Disk growth: Sessions grow unbounded -- single sessions can reach
3+ GB. Auto-compaction manages the context window only, NOT disk files.
Largest offenders: agent progress messages (83-171 MB per line), tool
outputs, thinking blocks.
Persistence: Session files live on the Fly volume and survive
stop/start cycles. The volume is the source of truth for conversation
history.

Disk Management

Two categories of files on the volume:

Session files (~/.claude/projects/*//*.jsonl): Conversation
history. Can grow to 3+ GB each. Prunable with backup.
Project files (/workspace/ or similar): Code files Claude
creates/edits. Typically small (<100 MB total). Never auto-pruned.

Pruning strategy (cron, every 15 minutes):


Volume Usage
Action


< 50%
No action


>= 50%
Compress + upload old sessions (> 3 days) to Tigris, then delete


>= 70%
Discord warning: "Workspace using 70% storage. Consider cleanup."


>= 85%
Aggressive pruning: backup + delete all sessions except latest


>= 95%
Critical Discord alert: "Storage nearly full. Claude may fail."


Session backup: Old sessions are compressed (gzip) and uploaded to
Tigris (Fly.io's S3-compatible object storage, $0.02/GB/month) before
deletion. Sessions are recoverable from Tigris if needed.
Active session protection: The pruning cron never deletes the most
recent session per instance (by mtime). It also checks for open file
handles (lsof) before pruning any session -- if Claude Code has the
JSONL file open, that session is skipped entirely.
Project files: Never auto-pruned. If the volume hits 95% even after
session pruning, the bot alerts the user via Discord rather than deleting
project files. Users manage project file cleanup via Discord commands.
Volume sizing: 5 GB per worker. A single active session rarely exceeds
3.3 GB, and with pruning keeping only the latest session + files under
3 days old, this provides adequate headroom.
5. Platform Comparison Matrix

Quick Comparison


Platform
DX
Cold Start
Cost (5 inst, 2hr/day)
Persistence
SSH
Verdict


Fly.io Machines
Good
~2-6s stop/start
~$33/mo (workers + bot)
Volumes ($0.15/GB/mo)
Built-in
CO-RECOMMENDED


Fly.io Sprites
Excellent
~100-500ms warm / 1-2s cold
~$20/mo (tier) + usage
100 GB persistent (Tigris-backed, $0.02/GB/mo)
WebSocket exec + FS API
CO-RECOMMENDED


E2B
Excellent
~1s resume
~$10-15/mo + $150 Pro
Pause/resume (beta)
e2b sandbox connect
Eliminated (beta storage, no real SSH)


Modal
Excellent
~1-3s
~$0-23/mo (w/ $30 credits)
Volumes (included)
modal shell
Eliminated (no real SSH)


Beam Cloud
Good
~1-3s (est.)
~$138/mo
Volumes (included), snapshots
beam shell
Eliminated (no real SSH)


Hetzner
Poor (DIY)
~10-30s (snapshot)
~$22/mo (always-on)
Native disk
SSH native
Eliminated (cold start + idle billing)


GCP Cloud Run
Moderate
~1-5s
~$15-25/mo
GCS/NFS mounts (newer), no SSH
No
Eliminated (no remote shell)


AWS Fargate
Poor
~15-45s (~50% less w/SOCI)
~$25-40/mo
EFS ($0.30/GB/mo)
ECS Exec
Eliminated (cold start > 10s)


RunPod
Moderate
~3-5s (FlashBoot)
~$86/mo (CPU, unverified)
Volumes ($0.07-0.20/GB/mo)
Yes
Overpriced for CPU


Fal.ai
Good (for AI)
N/A
Unknown (CPU)
/data vol + KV Store
No SSH
Eliminated (no remote shell)


Koyeb
Good
~200ms Light Sleep
~$46/mo (Pro req.)
Volumes (preview, standard instances only)
koyeb instances exec
Eliminated (preview storage, no real SSH)


Northflank
Good
~5-15s pause/resume (est.)
~$24/mo
Disks ($0.15/GB/mo)
northflank exec
Borderline (cold start ~5-15s est.)


Railway
Excellent
2-10s App Sleep
~$37/mo (Pro req.)
Volumes ($0.15/GB/mo)
railway ssh
Eliminated (boot timeout < 30s)


Render
Excellent
~15-60s suspend/resume (est.)
~$42/mo
Disks ($0.25/GB/mo)
SSH native
Eliminated (cold start > 10s)


Azure Container Apps
Moderate
15-37s+ scale-to-zero
~$53/mo
Azure Files (~$0.03/GiB/mo Hot)
az containerapp exec
Eliminated (cold start + idle cost)


Claude Code CLI Compatibility

Evaluates each platform against the actual CLI requirements from Section 3
and tightened P0/P1 requirements from Section 2 (8 GB RAM minimum, ~120 MB
binary, GA-quality persistent disk, SSH-compatible remote access with SCP/port
forwarding, full internet, <10s cold start, zero idle compute cost, no hard
boot timeout < 30s, no platform-imposed lifetime limit, long-running process
support). Many platforms advertise fast cold starts or low prices that do not
hold under Claude Code's constraints.


Platform
8 GB RAM
Persistent Disk
SSH
Cold Start (w/ 8 GB)
Boot Timeout Risk
Compatibility


Fly.io Machines
Yes (performance-1x, --vm-memory 8192)
Volumes
Built-in
~2-6s (stop/start only; suspend limited to ≤2 GB RAM)
None
Compatible


Fly.io Sprites
Yes (elastic max 8 vCPU / 8-16 GB RAM per sprite)
100 GB persistent (Tigris-backed)
WebSocket exec + FS REST API (no native SSH)
~100-500ms warm / 1-2s cold
None
Compatible (SSH P0 satisfied by FS API intent)


E2B
Pro only ($150/mo base)
Pause/resume (beta, bug #884)
e2b sandbox connect (no SCP)
~1s resume
None
Incompatible -- beta storage (GA P0), no real SSH (SSH P0), 30-day lifetime (P1), 8 GB requires Pro


Modal
Yes (configurable)
Volumes (included)
modal shell (no SCP)
~1-3s
24hr sandbox lifetime
Incompatible -- no real SSH (SSH P0), 24hr lifetime (P1)


Beam Cloud
Yes (configurable)
Volumes (included)
beam shell (no SCP)
~1-3s (est.)
None
Incompatible -- no real SSH (SSH P0)


Hetzner
Yes (CX23 native)
Native disk
SSH native
5-15+ min (snapshot restore)
N/A
Incompatible -- cold start > 10s, idle billing never stops


GCP Cloud Run
Yes
GCS/NFS volume mounts (newer feature)
No SSH
~1-5s
N/A
Incompatible -- no remote shell (P0 requirement)


AWS Fargate
Yes
EFS ($0.30/GB/mo)
ECS Exec
~15-45s (~8-23s w/SOCI)
None
Incompatible -- cold start > 10s even with SOCI


RunPod
Yes
Volumes
Yes
~3-5s (FlashBoot)
None
Compatible -- but ~$86/mo (unverified), GPU-focused pricing


Fal.ai
Unknown
/data distributed vol + KV Store
No SSH
N/A
N/A
Incompatible -- no remote shell, AI-inference focused


Koyeb
Eco (2vCPU/4GB) cannot attach volumes; Standard instances required
Volumes (preview, standard instances only)
koyeb instances exec (no SCP)
~200ms Light Sleep
Eco+volume incompatible
Incompatible -- preview storage (GA P0), no real SSH (SSH P0)


Northflank
Yes (configurable)
Disks ($0.15/GB/mo)
northflank exec
~5-15s estimated (pause/resume)
None
Borderline -- upper estimate exceeds 10s P0, no auto-sleep (P1)


Railway
Yes (configurable)
Volumes ($0.15/GB/mo)
railway ssh
2-10s
Fixed 10s boot timeout -- violates no-boot-timeout-<30s P0
Incompatible -- boot timeout kills 8 GB containers


Render
Yes (Pro: 2vCPU/4GB)
Disks ($0.25/GB/mo)
SSH native
~15-60s estimated (API suspend/resume)
None
Incompatible -- cold start > 10s P0


Azure Container Apps
Yes (configurable)
Azure Files (~$0.03/GiB/mo Hot)
az containerapp exec
15-37s+ (105MB image); likely 30-60s+ for Claude Code (~500MB+)
No hard timeout, but exceeds 10s P0
Incompatible -- cold start > 10s + idle compute charges


Key insight: Fly.io Machines' ~300ms suspend headline does not apply to
Claude Code. Suspend is limited to machines with ≤2 GB RAM. With the required
8 GB, you must use stop/start (~2-6s cold start). Fly.io Sprites, however,
achieve ~100-500ms warm resume at 8 GB because they use a fundamentally
different architecture (Firecracker microVM checkpointing, not Docker
suspend).
Fully compatible (no caveats): Fly.io Machines (stop/start), Fly.io Sprites
Compatible with trade-offs: RunPod (real SSH, GA volumes, but ~$86/mo
unverified CPU pricing, GPU-focused)
Borderline: Northflank (cold start ~5-15s est., upper end exceeds 10s P0;
northflank exec + northflank forward provide shell and port forwarding but
no SCP/SFTP -- borderline SSH P0 compliance)
Incompatible (eliminated by tightened P0s): E2B (beta storage + no real
SSH), Modal (no real SSH + 24hr lifetime), Beam Cloud (no real SSH), Koyeb
(preview storage + no real SSH), Railway (10s boot timeout), Render (cold start
~15-60s > 10s), Azure Container Apps (cold start 15-37s+ > 10s + idle compute
charges), AWS Fargate (cold start ~15-45s > 10s), Hetzner (cold start 10-30s +
idle billing never stops), GCP Cloud Run (no remote shell), Fal.ai (no remote
shell)
Evolutionary Development Capability

The intended usage model evolves over time:

Phase 1 (CLI Dispatch): Discord bot sends prompts to Claude Code CLI via
claude -p over remote shell. Instance is a passive compute environment.
Phase 2 (Agent SDK): Codebase migrates to Claude Agent SDK or equivalent.
Instance runs a long-lived process (e.g., webhook server or direct Discord bot).
Requires the platform to expose instance ports externally.
Phase 3 (Standalone Framework): Instance runs a mature, self-contained
application (web service, Discord bot, or API server) with minimal reliance on
the outer Discord bot dispatcher.

This evolution requires: (a) long-running process support (no hard sandbox
lifetime limits), (b) ability to expose instance ports to the internet, and
(c) in-place code evolution via remote shell without redeployment.


Platform
Long-Running Processes
External Port Exposure
In-Place Code Evolution
Phase Support
Status


Fly.io Machines
Indefinite (no lifetime limit)
Automatic Anycast (<app>.fly.dev HTTPS) + fly proxy for dev
Full SSH + SCP + port forwarding
Phase 1-3
CO-RECOMMENDED


Fly.io Sprites
Indefinite (no lifetime limit)
Public URL (<name>.sprites.app HTTPS) + sprite proxy for dev
WebSocket exec + FS REST API (no SCP)
Phase 1-3
CO-RECOMMENDED


RunPod
Indefinite (pods persist)
HTTP proxy via RunPod URL
SSH native + SCP
Phase 1-3
Viable (overpriced)


Northflank
Indefinite (services persist)
Automatic public URLs + custom domains
northflank exec + port forwarding (no SCP)
Phase 1-3
Borderline (SSH P0)


E2B
30-day max sandbox lifetime
No native port exposure
e2b sandbox connect (no SCP)
Phase 1 only
Eliminated (GA P0 + SSH P0)


Modal
24hr sandbox / Functions lose state
modal serve (dev tunnels)
modal shell (no SCP)
Phase 1 only
Eliminated (SSH P0)


Beam Cloud
Indefinite (persistent containers)
Endpoint URLs for deployed functions
beam shell (no SCP)
Phase 1-2
Eliminated (SSH P0)


Koyeb
Indefinite (services persist)
Automatic <app>.koyeb.app HTTPS
koyeb instances exec (no SCP)
Phase 1-3
Eliminated (GA P0 + SSH P0)


Key insight: After tightening P0s (GA storage + real SSH), Fly.io Machines,
Fly.io Sprites, and RunPod support all three evolutionary phases. Fly.io
Machines excels with automatic Anycast public HTTPS endpoints, full SSH, and
cheapest pricing. Sprites excels with fastest cold start (~100-500ms warm),
zero-config idle management, and 100 GB persistent storage. Both Fly.io
options have indefinite process lifetime. Northflank is borderline -- it has
port forwarding but lacks SCP/SFTP for file transfer.
Migration path options (to be decided during prototyping):

Webhook dispatch: Discord bot forwards messages via HTTP to a webhook
server running inside the instance. Requires the instance to expose an HTTP
endpoint (Fly.io: automatic via Anycast; others: platform-specific URL).
Direct bot: Instance runs its own Discord gateway connection with a
dedicated bot token. No HTTP endpoint needed, but requires managing Discord
reconnection and token lifecycle per instance.

Both approaches are viable on Fly.io. The choice depends on operational
preferences discovered during prototyping.
Detailed Platform Analysis


Fly.io Machines -- CO-RECOMMENDED

Why: Best balance of DX, cold start, cost, and persistence for this use case.
Persistent Storage:

Fly Volumes: local NVMe attached to Machines, $0.15/GB/month
Persist across stop/start cycles
Volume snapshots: $0.08/GB/month (first 10 GB free, auto-daily)
NVMe IOPS: shared-cpu-4x gets 8,000 IOPS / 32 MiB/s
Volume size does NOT affect start time (NVMe mount is constant-time)

Cold Start:

Stop/start: ~2-6s (infrastructure 1-3s + application startup 1-3s)
Unclean shutdown may add 1-5s for ext4 journal replay
Suspend limitation: machines must have ≤ 2 GB RAM (not applicable for 8 GB)
Image size: ~500 MB moderate, containerd snapshot 100-200ms

SSH: Built-in via fly ssh console (WireGuard tunnel, no public IP needed)
Pricing (per-second, no minimum):


Config
Hourly
Monthly (24/7)
Monthly (2hr/day)


shared-cpu-1x / 256 MB (bot)
$0.0028
$2.04
$0.17


shared-cpu-4x / 4 GB
$0.032
$23.36
$1.92


performance-1x / 8 GB (workers)
$0.0880
$64.24
$5.28


Idle cost: $0.15/GB/month for rootfs + volumes only. Stopped machines: $0
compute.
5 workers, 2hr/day, performance-1x/8 GB + 5 GB volume each + bot:
Compute: 5 x $5.28 = $26.40/mo. Storage: 5 x 5 GB x $0.15 = $3.75/mo.
Bot: $2.04/mo. Tigris backup: ~$0.50/mo.
Total: ~$33/month
DX:

CLI: fly machine create, start, stop, suspend, destroy
REST API: full Machines API with OpenAPI spec
Config: fly.toml + API for programmatic control
Auto-stop/auto-start: proxy-based wake on incoming traffic

Risks:

Suspend not applicable (8 GB exceeds 2 GB limit; must use stop/start)
Community Python SDK abandoned (4 stars, last commit July 2023). Official
fly-openapi-sdk also abandoned (10 commits, Sept 2023). Use httpx async
client for REST API + flyctl subprocess for SSH/secrets.
REST API exec endpoint has 60s hard timeout
REST API rate limits: 1 req/s sustained, burst to 3 req/s per action
Secrets API unreliable for app secrets -- use fly secrets set CLI instead


Fly.io Sprites -- CO-RECOMMENDED

Why: Native auto-sleep/wake, 100 GB persistent storage, sub-second warm
resume, zero-config idle management. Purpose-built for AI agent sandboxes.
What Sprites Are: Firecracker microVMs (not Docker containers) with
persistent filesystems that hibernate when idle and wake automatically on
demand. Launched January 2026 by Fly.io. Each sprite gets root access to
Ubuntu with common tools pre-installed (Node.js, Python, Go, Ruby, Rust,
Elixir, Java, Bun, Deno). Note: docs say Ubuntu 24.04 LTS but community
evidence suggests actual image is 25.04 (verify during prototyping).
Resource maximums per sprite: 8 vCPU, 8-16 GB RAM (docs inconsistent:
homepage says 16 GB, billing example says 8 GB), 100 GB storage. Resources
are elastic — you burst up to maximums but pay only for actual cgroup-metered
consumption.
Persistent Storage:

100 GB persistent root filesystem per sprite (Tigris S3-compatible object
storage as durable layer + local NVMe read-through cache)
Storage persists indefinitely across sleep/wake cycles
Checkpoints: copy-on-write filesystem snapshots (milliseconds, no process
interruption). API: POST /v1/sprites/{name}/checkpoint
Installed packages, file changes, and config modifications survive
sleep/wake automatically
Running processes and in-memory state do NOT survive sleep/wake (must
restart on wake)

Cold Start:

Warm resume: ~100-500ms (sprite was recently active, NVMe cache warm)
Cold start: 1-2s (first access or long-idle sprite)
New creation: 0.8-2s (verified empirically: 0.8s in user testing)
Auto-sleep: 30 seconds of inactivity (non-configurable). "Inactivity" =
no active exec sessions, no open TCP connections, no running TTY sessions.
During a sprite exec running Claude Code, the sprite stays awake.

SSH Equivalent: No native SSH. Instead provides:

WebSocket exec (WSS /v1/sprites/{name}/exec): interactive shell with
TTY, stdin/stdout/stderr multiplexing, session persistence across
disconnects. Supports max_run_after_disconnect for background processes.
Filesystem REST API: file read/write/list operations via HTTP
sprite proxy: port forwarding from localhost to sprite ports
sprite console: CLI-based interactive shell (used in user testing)
Public URLs: https://<name>.sprites.app for HTTP services
This combination satisfies the SSH P0 intent (interactive shell + file
transfer + port forwarding) without literal SSH protocol.

Pricing (usage-based + subscription tiers):


Resource
Rate
Monthly (730h)


CPU
$0.07/CPU-hour
$51.10/CPU-month


RAM
$0.04375/GB-hour
$31.94/GB-month


Hot Storage (NVMe)
$0.000683/GB-hour
~$0.50/GB-month


Base Storage (Tigris)
$0.000027/GB-hour
~$0.02/GB-month


Sleeping sprites: $0 compute. Only base storage ($0.02/GB-month) accrues.
Minimum billing: 6.25% CPU/second, 250 MB RAM/second when active.
Subscription tiers (prepaid credits toward usage):


Tier
Monthly
Max Active Sprites
CPU Hours
RAM Hours
Storage (GB-months)


Recruit (free)
$0
Unknown
Trial $30 credit
--
--


Adventurer
$20
20
450
1,800
50


Veteran
$50
50
800
3,200
100


Hero
$100
100
1,200
4,800
150


Champion
$200
200
1,800
7,200
225


5 sprites, 2hr/day, Claude Code workload — two estimates:
Conservative estimate (avg 2 CPUs, 1.5 GB RAM active — spec assumption):
CPU: 5 x 2hr x 2 CPU x 30 = 600 CPU-hrs x $0.07 = $42/mo.
RAM: 5 x 2hr x 1.5 GB x 30 = 450 GB-hrs x $0.04375 = $19.69/mo.
Storage: 5 x 5 GB used x $0.02 = $0.50/mo (base only; storage is billed
on used blocks, not allocated 100 GB. Hot NVMe cache is $0.50/GB-mo but
only applies while sprite is active).
Subtotal: **$62/mo** usage.
sprites.dev documented example (4-hour Claude Code session = $0.44,
implying avg ~0.6 CPU, ~1.5 GB RAM from cgroup metering):
CPU: 5 x 2hr x 0.6 CPU x 30 = 180 CPU-hrs x $0.07 = $12.60/mo.
RAM: 5 x 2hr x 1.5 GB x 30 = 450 GB-hrs x $0.04375 = $19.69/mo.
Storage: $0.50/mo.
Subtotal: **$33/mo** usage.
Important: Sprites pricing is usage-metered from cgroup counters
(cpu.stat usage_usec for CPU, memory.current sampling for RAM), not
fixed-rate per hour. The conservative estimate may be 2x actual costs.
Prototype validation should measure real cgroup consumption to refine.
The Adventurer tier ($20/mo) acts as prepaid credits — overage is billed
at the same per-unit rates.
DX:

CLI: sprite create, sprite console, sprite exec, sprite proxy,
sprite checkpoint
REST API: full lifecycle at https://api.sprites.dev/v1/
SDKs: Python (sprites-py, import as from sprites import SpritesClient),
TypeScript (@fly/sprites), Go (github.com/superfly/sprites-go),
Elixir (github: superfly/sprites-ex)
Pre-installed: Ubuntu (docs say 24.04; see version caveat above), Claude
Code, Node.js, Python, Go, git
No Dockerfile needed (environment customization via exec commands)

Risks:

No region selection: Proximity-routed only. Cannot specify region at
creation time. Fly.io staff confirmed "not currently planned."
No SLA: No uptime guarantees documented. Storage stack acknowledged
as "in flux" by Fly.io blog.
No custom base images: Must configure environment post-creation via
exec commands (but changes persist across sleep/wake).
30s non-configurable auto-sleep: Cannot extend the idle timeout.
Active exec sessions keep sprites awake, but between Discord messages
the sprite will sleep after 30s. Warm resume (~100-500ms) mitigates
this for chat-based workflows.
Maturity: Launched January 2026. Early-stage product. Fly.io's own
blog recommends containerizing for production scale-out.
SDK gaps: JS SDK missing some helper methods (e.g., URL settings
update). Python/Go SDKs more complete.
No internal DNS: Sprites are isolated. No .internal DNS between
sprites. Communication via Sprites REST API or public URLs.
Non-configurable resource maximums: 8 vCPU, 8-16 GB RAM, 100 GB
storage. Cannot configure smaller (or larger) allocations. You pay for
actual cgroup-metered use, not the maximums — but cannot right-size
for lighter workloads that need less than 8 vCPU.

Sprites vs Fly.io Machines: Architecture Simplification

Sprites eliminate several complex subsystems required by raw Fly.io Machines:


Capability
Fly.io Machines (DIY)
Sprites (Native)


Idle detection
3-layer hybrid system (process check + hard cap + auto-stop)
Built-in: 30s auto-sleep based on exec session activity


Wake on demand
Bot calls Machines API POST /start (~2-6s)
Built-in: any API call wakes sprite (~100-500ms)


Disk management
5 GB volume + pruning cron + Tigris backup + threshold alerts
Built-in: 100 GB persistent filesystem, no pruning needed


Container setup
Dockerfile with 4-layer auto-update defense-in-depth
Built-in: pre-installed Ubuntu + Claude Code


Separate apps
Bot app + worker app (different lifecycle needs)
Bot can be a sprite too (or separate Fly app)


Health endpoint
Custom 10-line HTTP server for activity detection
Not needed: exec session tracking is native


Session backup
Tigris upload cron for old session files
Built-in: all data in Tigris by default


Net effect: The Sprites prototype track requires ~60-70% less custom
infrastructure code compared to the Machines track.

E2B -- ELIMINATED (Beta Storage + No Real SSH)

Why: Prebuilt Claude Code template, excellent auto-pause/resume
cycle, purpose-built for AI agent sandboxes.
Eliminated: Beta-only persistence with known data loss bug (#884) violates
GA storage P0. e2b sandbox connect is a proprietary terminal without
SCP/SFTP/port forwarding, violating SSH P0.
Persistent Storage:

No traditional volumes -- persistence via pause/resume of entire VM state
Paused sandboxes preserve filesystem + memory + running processes
Max paused duration: 30 days from creation
Known bug: file changes may not persist after multiple pause/resume cycles
(GitHub #884)

Cold Start:

New sandbox: ~150ms
Resume from pause: ~1 second
Pause time: ~4s per GB RAM

SSH: CLI e2b sandbox connect (interactive terminal, not traditional SSH)
Pricing (per-second):


Resource
Rate
Hourly


1 vCPU
$0.0000140/s
$0.050/hr


2 vCPU
$0.0000280/s
$0.101/hr


RAM (per GiB)
$0.0000045/s
$0.016/hr


Paused time: free during beta
Pro tier required for 24hr sessions: $150/month base
5 instances, 2hr/day, 2vCPU + 2GB each:
Compute: 5 x 2hr x $0.133/hr x 30 = $39.90/mo. Plus Pro: $150/mo.
Total: ~$190/month (Pro tier makes this expensive for small scale)
Hobby tier (free): limited to 1hr sessions, 20 concurrent, 2 vCPU / 512MB
fixed (too little RAM for Claude Code).
DX:

anthropic-claude-code prebuilt template (ready to use, no custom build needed)
Python + TypeScript SDKs
Auto-pause/resume cycle built in
Sandbox.connect(id) auto-resumes paused sandboxes

Risks:

$150/month Pro tier minimum for usable sessions
30-day maximum sandbox lifetime (must recreate periodically) -- prevents
running indefinite processes like a Discord bot (P1 long-running concern)
Persistence bug in beta (#884)
No standalone persistent volumes (if sandbox dies, data is lost)
Phase 1 only: cannot evolve to Agent SDK or standalone framework in-place


Modal -- ELIMINATED (No Real SSH)

Why: Included storage, excellent Python SDK, memory snapshots (CPU GA).
Eliminated: modal shell is a proprietary terminal without SCP/SFTP/port
forwarding, violating SSH P0. Additionally, 24hr sandbox lifetime violates P1
(no platform-imposed lifetime limit).
Persistent Storage:

Modal Volumes: included (storage pricing not prominently listed; may incur charges at scale)
Distributed filesystem, optimized for write-once/read-many
v2 in beta (unlimited files, hundreds of concurrent writers)

Cold Start:

Standard: ~1-3 seconds
Memory snapshots: sub-second (CPU GA, GPU still experimental)
Keep-warm: min_containers=N or scaledown_window (up to 20 min)

SSH: modal shell (interactive bash), modal container exec
Pricing (per-second):


Resource
Rate
Hourly


1 CPU core (= 2 vCPU)
$0.0000131/core/s
$0.047/hr


RAM (per GiB)
$0.00000222/GiB/s
$0.008/hr


Note: the 3x multiplier applies to nonpreemptible=True (an opt-in flag
for non-preemptible capacity), not to sandboxes specifically. Sandboxes use
standard per-second rates unless nonpreemptible is opted into.
Free tier: $30/month credits
5 instances, 2hr/day, 1 core + 4GB:
Compute: 5 x 2hr x ($0.047 + 4x$0.008) x 30 = $23.10/mo minus $30 credit.
Total: ~$0/month (covered by free credits for light usage)
DX:

Everything defined in Python code (no Dockerfiles, no YAML)
modal run, modal deploy, modal shell
Images built programmatically: modal.Image.debian_slim().pip_install(...)

Risks:

Memory snapshots: CPU GA but GPU still experimental; no documented expiry policy
24-hour max sandbox lifetime -- prevents running indefinite processes like a
Discord bot (P1 long-running concern)
v2 Volumes still in beta (no zero-data-loss guarantee)
Phase 1 only: cannot evolve to Agent SDK or standalone framework in-place


Beam Cloud -- ELIMINATED (No Real SSH)

Why: Snapshots, free volumes, Python-native, open-source core.
Eliminated: beam shell is an exec-based shell without SCP/SFTP/port
forwarding, violating SSH P0. Also overpriced at ~$138/mo for 5 instances.
Persistent Storage:

Distributed Volumes: free (included in compute pricing)
Snapshots: capture filesystem + memory state, restore later

Cold Start: Typically under 1s container start (custom runc runtime; ~1-3s estimated with application initialization)
SSH: beam shell (interactive root shell), beam container exec
Pricing (per-second):


Resource
Rate
Hourly


1 CPU core
$0.0000528/core/s
$0.190/hr


RAM (per GB)
$0.0000056/GB/s
$0.020/hr


5 instances, 2hr/day, 2 cores + 4GB:
Compute: 5 x 2hr x ($0.380 + $0.080) x 30 = $138/mo.
Total: ~$138/month (more expensive than alternatives)
DX: Python SDK, CLI, hot-reload dev mode, open-source core (beta9)
Risks:

Higher CPU pricing than Modal/Fly.io
Volume propagation delay up to 60s
Platform maturity (newer, occasional performance issues reported)


Hetzner -- ELIMINATED (Cold Start + Idle Billing)

Why: Cheapest per-month if instances run frequently. No platform abstraction.
Eliminated: snapshot restore takes 5-15+ min (exceeds 10s P0), stopping
does not stop billing (violates zero idle cost P0).
CX23 Server (2 vCPU, 4GB RAM, 40GB NVMe): ~$4.35/month
(Note: CX22 was renamed to CX23 in Hetzner's current lineup.)
Key limitations:

Billed monthly regardless of state (no per-second billing)
Stop/start via API, but stopping does NOT stop billing -- only server
deletion stops charges (important: no hibernate-to-save-money option)
Snapshot restore: 5-15+ minutes (dealbreaker for cold start requirement)
Full DIY: must build hibernation, orchestration, monitoring yourself
Volumes: €0.044/GB/month

5 instances always-on: 5 x $4.35 + 5 x 5GB x $0.05 = ~$23/month
Not suitable due to slow snapshot restore and DIY overhead, but establishes
the cost floor.

Koyeb -- ELIMINATED (Preview Storage + No Real SSH)

Why: Industry-leading cold starts via Light Sleep (~200ms), per-second billing,
native scale-to-zero. Limited by volumes still in preview and Pro plan requirement.
Eliminated: Volumes in public preview (no GA date) violates GA storage P0.
koyeb instances exec is an exec-based shell without SCP/SFTP, violating SSH P0.
Persistent Storage:

NVMe SSD volumes: 1-10GB max, standard instances on any paid plan
(not eco/free instances). Pro plan: $29/mo base.
Available in Washington D.C. and Frankfurt only
Currently in public preview (free during preview, GA pricing TBD)
Volumes require standard instances (eco instances not supported)
Single-instance only when using volumes (no horizontal scaling)

Cold Start:

Light Sleep: ~200ms (memory snapshot/eBPF MicroVM restore)
Deep Sleep: 1-5s (full container restart)
Instances transition: idle → Light Sleep → Deep Sleep after configurable timeout
Light Sleep costs 15% of instance rate at GA (free during preview)

SSH: koyeb instances exec / koyeb services exec (interactive shell, not
traditional SSH). Also koyeb instances cp for file transfer.
Pricing (per-second):


Config
Hourly
Monthly (24/7)
Monthly (2hr/day)


eco-large / 2vCPU / 4GB
$0.029
$20.74
$1.73


standard large / 4vCPU / 4GB
$0.058
$41.47
$3.46


Pro plan required for volumes: $29/month base.
5 instances, 2hr/day, standard large + 5GB volume each (volumes require
standard instances):
Compute: 5 x $3.46 = $17.28/mo. Storage: $0 (preview). Pro: $29/mo.
Total: ~$46/month (preview) / ~$75/month (GA with Light Sleep charges)
DX:

CLI: koyeb services create, pause, resume, exec
REST API: full lifecycle management
Python + JavaScript SDKs (Sandboxes feature)
Terraform and Pulumi providers
Scale-to-zero is HTTP-triggered (requires HTTP endpoint for auto-wake)

Risks:

Volumes in public preview (no GA date, limited to 2 regions, 10GB max)
Eco instances (best price/spec match) cannot use volumes
Pro plan ($29/mo) required just for volume access
Scale-to-zero triggered by HTTP only (Discord bot needs HTTP proxy layer)
HTTP/2 connections cannot wake sleeping services (HTTP/1.1 only for wake trigger)
Light Sleep pricing at GA adds ~$28/mo for 5 idle instances


Northflank -- Borderline (Cold Start ~5-15s Est. + SSH P0 Gap)

Why: Per-second billing with pause/resume API, comprehensive shell access,
cheapest viable option at ~$24/mo. Borderline due to: (1) estimated ~5-15s cold
start (upper end exceeds 10s P0), (2) northflank exec + northflank forward
provide shell and port forwarding but lack SCP/SFTP for file transfer (borderline
SSH P0 compliance). Requires building your own idle detection (P1 concern).
Persistent Storage:

SSD persistent disks: $0.15/GB/month
Persist across container restarts and redeployments
Single-instance only when using persistent disks

Cold Start:

Pause/resume: ~5-15s estimated (full container restart from cached image; no official timing documented)
No official SLA; "seconds" per Northflank documentation
First image pull can be 30-60s+ (subsequent pulls cached)

SSH: northflank exec CLI + API endpoint (/v1/api/execute-command).
Web UI shell also available. Port forwarding via northflank forward.
Note: No SCP/SFTP equivalent -- file transfer requires workarounds (e.g.,
tar over stdin/stdout, or git push/pull). Borderline SSH P0 compliance.
Pricing (per-second):


Resource
Hourly
Monthly (24/7)


1 vCPU
$0.017
$12.00


1 GB RAM
$0.008
$6.00


1 GB SSD
--
$0.15


No platform fee. Free Developer Sandbox for testing.
5 instances, 2hr/day, 2vCPU + 4GB + 5GB disk each:
Compute: 5 x 60hr x $0.067/hr = $20.00/mo. Storage: 5 x 5GB x $0.15 = $3.75/mo.
Total: ~$24/month
DX:

CLI: northflank exec, northflank forward, full CRUD operations
REST API: pause, resume, restart, scale, execute commands
JavaScript client library
GitOps and Infrastructure as Code templates
BYOC option (run on your own AWS/GCP/Azure infrastructure)

Risks:

No SCP/SFTP file transfer (borderline SSH P0 -- must use workarounds)
No native scale-to-zero (must build idle detection + pause API calls yourself)
Cold start not SLA-backed ("seconds" is vague)
Less well-known platform (smaller community, less ecosystem)


Railway -- ELIMINATED (Sleep-Wake Timeout < 30s)

Why: Excellent DX with native SSH, App Sleep for scale-to-zero, familiar
Heroku-like workflow. Eliminated: fixed 10-second sleep-wake timeout
(distinct from the configurable 300s healthcheck timeout) violates the
no-boot-timeout-<30s P0. Services waking from App Sleep that don't start
in 10s receive 502 errors.
Persistent Storage:

Persistent volumes: $0.15/GB/month
Survive deploys and restarts
Single-instance only when using volumes

Cold Start:

App Sleep: 2-10s (full container stop/restart)
First request during wake returns 502 error (must handle in client)
Fixed 10-minute inactivity timeout before sleep (not configurable)
10-second sleep-wake timeout -- if service doesn't start within 10s of
waking from App Sleep, returns 502 (separate from configurable healthcheck
timeout which defaults to 300s)
No memory snapshot -- full cold boot on wake

SSH: Native via railway ssh (direct shell into running service)
Pricing (per-second):


Resource
Per Minute
Hourly
Monthly (24/7)


1 vCPU
$0.000463
$0.028
$20.28


1 GB RAM
$0.000231
$0.014
$10.12


Pro plan: $20/month per workspace (includes $20 credit toward usage, unlimited seats).
5 instances, 2hr/day, 2vCPU + 4GB + 5GB volume each:
Compute: 5 x 60hr x $0.111/hr = $33.30/mo. Storage: 5 x 5GB x $0.15 = $3.75/mo.
Pro: $20/mo (incl. $20 credit).
Total: ~$37/month
DX:

CLI: railway up, railway ssh, railway run
Dashboard with real-time logs and metrics
Git-based deploys or Docker images
Environment variables and secrets management

Risks:

502 errors on first request during App Sleep wake (client must retry)
10-second boot timeout: services that take >10s to start receive 502 errors
(Claude Code container must be optimized for fast startup)
2-10s cold start with fixed 10-minute sleep timeout (not configurable)
No memory snapshot (all in-memory state lost on sleep)
Volume + service coupling (single instance only)


Render -- ELIMINATED (Cold Start > 10s)

Why: Excellent DX with native SSH, persistent disks, comprehensive API.
Eliminated: resume from suspend takes ~15-60s estimated, exceeding the 10s P0.
Persistent Storage:

Persistent disks: $0.25/GB/month, encrypted SSD with daily snapshots
Disk size can be increased but not decreased
Single-instance only when using persistent disks
Disks prevent zero-downtime deploys

Cold Start:

Suspend/resume: ~15-60s estimated (full container restart, not memory snapshot; timing not officially documented for paid API suspend/resume)
Suspend via API (POST /v1/services/{id}/suspend) stops billing
Resume via API (POST /v1/services/{id}/resume) restarts container
No native scale-to-zero; must build orchestration via API

SSH: Native SSH via ssh SERVICE@ssh.REGION.render.com. Dashboard shell also
available. SSH keys managed per-account.
Pricing (per-second, prorated from monthly rate):


Instance
Spec
Monthly (24/7)
Hourly


Standard
1 vCPU / 2GB
$25
$0.034


Pro
2 vCPU / 4GB
$85
$0.118


Pro Plus
4 vCPU / 8GB
$175
$0.243


No platform fee on Hobby plan. Professional workspace: $19/user/month.
5 instances, 2hr/day, Pro (2vCPU/4GB) + 5GB disk each:
Compute: 5 x 60hr x $0.118/hr = $35.42/mo. Storage: 5 x 5GB x $0.25 = $6.25/mo.
Total: ~$42/month
DX:

CLI: render command (Homebrew installable)
REST API: full lifecycle including suspend/resume
Git-based auto-deploys or Docker images
Render MCP server for Claude Code/Cursor integration

Risks:

15-60s resume exceeds the <10s P0 requirement
No native scale-to-zero (must build suspend/resume orchestration)
Resume is a cold restart (no memory snapshot)
Persistent disks block horizontal scaling and zero-downtime deploys


Azure Container Apps -- ELIMINATED (Cold Start + Idle Cost)

Why: True serverless scale-to-zero with Azure Files for persistence.
Eliminated: cold starts of 15-37s+ (measured for tiny images) exceed 10s P0,
and idle compute charges (~$0.011/vCPU-hr) violate zero idle cost P0.
Persistent Storage:

Azure Files SMB mount: ~$0.03/GiB/month (Standard Hot LRS; Transaction Optimized tier is ~$0.06/GiB)
Persists across container restarts and replica replacements
ReadWrite or ReadOnly mount modes

Cold Start:

Scale-to-zero resume: 15-37s+ (measured for 105MB image)
Official Hello World image: 22s cold start (GitHub issue #997, 50 thumbs-up)
No official SLA for scale-up latency
Larger images (500MB-1GB for Claude Code) will likely be slower

SSH: az containerapp exec (interactive shell into running container).
Azure Portal console also available. Cannot exec into scaled-to-zero containers.
Pricing (per-second, Consumption plan):


Resource
Per Second
Hourly


1 vCPU (active)
$0.000024
$0.086


1 GiB RAM (active)
$0.000003
$0.011


1 vCPU (idle)
$0.000003
$0.011


Free tier: 180,000 vCPU-sec + 360,000 GiB-sec/month (~$13 value).
5 instances, 2hr/day, 2vCPU + 4GiB each:
Compute: 5 x $12.96 = $64.80/mo. Free tier: -$12.96.
Storage: 5 x 5GB x $0.03 = $0.75/mo.
Total: ~$53/month
DX:

CLI: az containerapp create, exec, update
ARM/Bicep templates, Terraform provider
KEDA-based autoscaling rules (HTTP, custom metrics)
Integrated with Azure Container Registry, Log Analytics

Risks:

Cold start is the critical concern: 15-37s for tiny images, likely 30-60s+
for a real Claude Code container. Exceeds 10s P0 requirement.
KEDA scaling rules require HTTP or event source (Discord messages need proxy)
Cannot exec into scaled-to-zero containers (must wait for scale-up)
Azure ecosystem complexity (VNet, resource groups, subscriptions)


Eliminated Platforms


Platform
Reason
P0 Violated


E2B
Beta-only persistence with known data loss bug (#884). e2b sandbox connect is a proprietary terminal without SCP/SFTP/port forwarding. Also has 30-day sandbox lifetime (P1 concern). Excellent DX and prebuilt Claude Code template, but fails two P0s.
GA-quality persistent disk, SSH-compatible remote access


Modal
modal shell is a proprietary terminal without SCP/SFTP/port forwarding. 24hr sandbox lifetime limits long-running processes (P1 concern). Excellent Python SDK DX and included v1 GA storage.
SSH-compatible remote access


Beam Cloud
beam shell is an exec-based shell without SCP/SFTP/port forwarding. Also overpriced (~$138/mo for 5 instances). Free volumes and snapshot support, but no real SSH.
SSH-compatible remote access


Koyeb
Volumes in public preview (no GA date, limited to 2 regions, 10GB max). koyeb instances exec lacks SCP/SFTP. Excellent ~200ms cold start and native scale-to-zero, but fails two P0s.
GA-quality persistent disk, SSH-compatible remote access


Railway
Fixed 10-second sleep-wake timeout returns 502 if container doesn't start in time (separate from configurable 300s healthcheck timeout). An 8 GB Claude Code container with ~500 MB image is risky in 10s. Otherwise strong DX with native SSH and App Sleep.
No hard boot timeout < 30s


Render
API suspend/resume takes ~15-60s estimated (not officially documented for paid tier). Borderline under old 60s P0, clearly fails under tightened 10s. Good DX and native SSH.
Cold start < 10s


Azure Container Apps
15-37s+ cold start for 105MB images (likely 30-60s+ for Claude Code). Also has idle compute pricing (~$0.011/vCPU-hr) that violates zero idle cost.
Cold start < 10s, Zero idle cost


AWS Fargate
15-45s cold starts (~50% reduction with SOCI lazy loading = ~8-23s, still > 10s). Complex networking, EFS expensive ($0.30/GB/mo).
Cold start < 10s


Hetzner
Snapshot restore takes 5-15+ minutes. Stopping servers does NOT stop billing (only deletion stops charges). No per-second billing, no auto-sleep.
Cold start < 10s, Zero idle cost


GCP Cloud Run
No interactive shell access (kubectl exec equivalent absent). Now has GCS and NFS volume mounts (newer feature), but no remote shell makes debugging impossible.
SSH-compatible remote access


Fal.ai
No interactive shell access. Has /data distributed volume and KV Store for persistence, but AI-inference focused pricing and no remote shell make it unsuitable.
SSH-compatible remote access


Heroku
No persistent disk (ephemeral filesystem, 24hr dyno cycling). 8 GB RAM requires $500/mo Performance-L dyno ($2,500/mo for 5 instances).
GA-quality persistent disk


DigitalOcean App Platform
No persistent disk, no remote shell access. 8 GB RAM tier costs $75/mo per app ($375/mo for 5 instances).
GA-quality persistent disk, SSH-compatible remote access


AWS App Runner
No persistent disk (no EFS support despite claims), no remote shell access at all. Purpose-built for stateless HTTP services only.
GA-quality persistent disk, SSH-compatible remote access


6. Cost Modeling

Fly.io Machines costs calculated with performance-1x / 8 GB RAM
workers ($0.0880/hr) + shared-cpu-1x / 256 MB bot ($0.0028/hr) + 5 GB
volumes ($0.15/GB/mo) + Tigris session backup (~$0.02/GB/mo, estimated
~$0.10/instance/mo).
Fly.io Sprites costs shown as two estimates: conservative (avg 2 CPU,
1.5 GB RAM — spec assumption) and documented (based on sprites.dev's
$0.44/4hr Claude Code example, implying avg ~0.6 CPU, ~1.5 GB RAM).
Usage-based metering (CPU $0.07/CPU-hr, RAM $0.04375/GB-hr) + base storage
($0.02/GB-mo) + Adventurer tier ($20/mo prepaid credit). Prototype
validation should measure actual cgroup consumption to refine estimates.
Scenario: 5 Workers, 2 Hours Active/Day, performance-1x/8 GB, 5 GB Disk

Platforms marked strikethrough are eliminated by tightened P0 requirements.


Platform
Compute/mo
Storage/mo
Base Fee
Other
Total/mo


Fly.io Machines
$26.40
$3.75
$0
Bot $2.04 + Tigris ~$0.50
~$33


Fly.io Sprites (conservative)
~$62 (usage)
~$0.50 (base)
$20 (Adventurer)
--
~$62 (usage exceeds tier)


Fly.io Sprites (documented example)
~$33 (usage)
~$0.50 (base)
$20 (Adventurer)
--
~$33 (within tier)


RunPod (CPU, unverified)
~$86
~$5
$0
--
~$91


Northflank (borderline)
$40.00
$3.75
$0
--
~$44


E2B (Pro)
$39.90
$0 (beta)
$150
--
$190


Modal
~$23
$0
$0
--
$0 (covered by $30 credits)


Beam Cloud
$138
$0
$0
--
$138


Koyeb (Pro, preview)
$34.56
$0 (preview)
$29
--
$64


Hetzner (always-on)
$43.50
$1.25
$0
--
$45


Railway (Pro)
$66.60
$3.75
$20 (incl. $20 credit)
--
$70


Render (Pro Plus)
$73.13
$6.25
$0
--
$79


Azure Container Apps
$103.68
$0.75
$0
--
$104


Scenario: 20 Workers, 2 Hours Active/Day


Platform
Total/mo


Fly.io Machines
~$123


Fly.io Sprites (conservative)
~$248 (Veteran $50 tier)


Fly.io Sprites (documented example)
~$133 (Veteran $50 tier)


RunPod (CPU, unverified)
~$364


Northflank (borderline)
~$175


Hetzner
$179


Railway (Pro)
$260


Render
$317


Azure Container Apps
$451


Scenario: 50 Workers, 2 Hours Active/Day


Platform
Total/mo


Fly.io Machines
~$304


Fly.io Sprites (conservative)
~$620 (Hero $100 tier)


Fly.io Sprites (documented example)
~$333 (Hero $100 tier)


RunPod (CPU, unverified)
~$910


Northflank (borderline)
~$437


Hetzner
$446


Railway (Pro)
$650


Render
$792


Azure Container Apps
$1,120


Scenario: Phase 2-3, 5 Workers, 8 Hours Active/Day, performance-1x/8 GB

Models Phase 2-3 usage where workers run persistent services (webhook servers,
Agent SDK processes) for extended hours. P1 threshold: < $100/month.


Platform
Compute/mo
Storage/mo
Base Fee
Other
Total/mo
Meets P1?


Fly.io Machines
$105.60
$3.75
$0
Bot $2.04 + Tigris ~$0.50
~$112
No


Fly.io Sprites (conservative)
~$247 (usage)
~$0.50
$50 (Veteran)
--
~$247
No


Fly.io Sprites (documented example)
~$133 (usage)
~$0.50
$50 (Veteran)
--
~$133
No


RunPod (CPU, unverified)
~$348
~$5
$0
--
~$353
No


Northflank (borderline)
$160.00
$3.75
$0
--
~$164
No


Key insight: At extended usage (8hr/day), no platform meets the $100/mo
P1 threshold with the corrected performance-1x pricing ($0.0880/hr).
Fly.io Machines at ~$112/mo is closest but exceeds by ~12%. Sprites'
usage-based pricing ranges from ~$133/mo (documented example) to ~$247/mo
(conservative) depending on actual cgroup consumption. If the $100/mo P1 is
firm, options include: (a) downgrading to shared-cpu-4x (lower per-hour rate
but shared vCPUs), (b) accepting the overage as Phase 2-3 cost of business,
or (c) reducing active hours. RunPod's GPU-focused pricing remains
uneconomical for CPU workloads.
7. CLI Interface Design

Instance Management

# Create a new instance linked to a Discord channel
cbot create --channel <discord-channel-id> --name "project-alpha"
  # -> Creates compute instance + persistent volume
  # -> Installs Claude Code from base template
  # -> Returns instance ID

# List all instances
cbot list
  # ID          CHANNEL              STATUS     DISK   CREATED
  # inst-abc123 #project-alpha       running    5GB    2024-01-15
  # inst-def456 #bug-fixes           suspended  8GB    2024-01-12
  # inst-ghi789 #research            stopped    5GB    2024-01-10

# SSH into an instance
cbot ssh inst-abc123
  # -> Opens interactive shell in the running instance

# Wake a hibernated instance
cbot wake inst-def456

# Stop/hibernate an instance
cbot stop inst-abc123

# Resize disk
cbot resize-disk inst-abc123 --size 10GB

# Destroy an instance (requires confirmation)
cbot destroy inst-ghi789
  # WARNING: This will permanently delete instance inst-ghi789
  #          and all associated data. Type 'yes' to confirm:

# View instance logs
cbot logs inst-abc123 --tail 50
Configuration

# Apply base template to all instances
cbot config apply --template base.yml

# Override config for a specific instance
cbot config set inst-abc123 --env EXTRA_TOOL=enabled

# View instance config
cbot config show inst-abc123
8. Recommendation

Dual recommendation: prototype both Fly.io Sprites and Fly.io Machines
in parallel. Let prototype results determine which becomes the primary
platform.
Option A: Fly.io Sprites

Best for: Fastest time-to-prototype, minimal custom infrastructure,
best cold start performance, highest DX.

Sub-second warm resume: ~100-500ms (vs ~2-6s for Machines). Closest
to the 5s P1 ideal by an order of magnitude.
Zero-config idle management: Native 30s auto-sleep replaces the
entire 3-layer idle detection system. No health endpoint, no process
monitor, no hard cap timer needed.
100 GB persistent storage: Eliminates disk pruning cron, Tigris
backup, and volume threshold alerts. Session files and project files
have ample room without management.
No Dockerfile: Pre-installed Ubuntu + Claude Code + tools (see
version caveat in Section 3). No 4-layer auto-update defense needed.
API exec dispatch: Bot opens WSS /v1/sprites/{name}/exec to
run Claude Code. Native wake-on-demand (sleeping sprites wake
automatically on API call). No separate start/stop orchestration.
SDKs: Python (sprites-py), TypeScript (@fly/sprites), Go, Elixir.
More complete than Machines' abandoned SDKs.
Checkpoints: Instant filesystem snapshots for rollback before risky
operations. Copy-on-write, milliseconds, no process interruption.
Empirically validated: User testing confirmed 0.8s sprite creation,
successful Claude Code v2.1.29 installation, root console access.

Tradeoffs:

Higher cost at scale (conservative estimate): ~$62/mo for 5 sprites
(Phase 1) vs ~$33/mo Machines. Usage-based pricing becomes expensive at
sustained utilization (Phase 2-3: ~$247/mo vs ~$112/mo). Both exceed $100/mo
P1 threshold at 8hr/day. Note: sprites.dev documented example implies
~$33/mo (Phase 1) / ~$133/mo (Phase 2-3) if actual cgroup-metered usage
matches their benchmarks.
No region selection: Proximity-routed only. Cannot co-locate with
specific infrastructure.
No SLA: Early-stage product (Jan 2026). Storage stack "in flux."
30s non-configurable auto-sleep: Cannot extend idle timeout. Between
messages, sprites sleep after 30s. Warm resume mitigates this.
No native SSH: WebSocket exec + FS API satisfies the P0 intent but
is not literal SSH protocol.
No custom images: Must configure via exec after creation (changes
persist, so one-time setup).

Option B: Fly.io Machines

Best for: Lower cost at scale, full SSH, production maturity, Phase 2-3
extended usage.

Cheapest at scale: ~$33/month for 5 workers + bot, ~$112/mo Phase
2-3 (exceeds $100 P1 threshold). ~$304/month for 50 workers.
Full SSH access: fly ssh console with complete SSH protocol
(SCP/SFTP file transfer, port forwarding, key-based auth).
Machines REST API: Full API at api.machines.dev/v1 for
programmatic lifecycle control. Production-grade (used by Supabase).
GA-quality persistent volumes: NVMe-backed, $0.15/GB/month with
automatic daily snapshots. 8,000 IOPS on shared-cpu-4x.
Separate apps architecture: Bot app (~$2/mo always-on) + worker app
(scale-to-zero). Internal .internal DNS for communication.
Evolutionary development: Supports all three phases (CLI dispatch
-> Agent SDK -> standalone). Anycast public HTTPS (<app>.fly.dev).
Mature platform: Years of production use. Documented APIs. Known
failure modes with workarounds.

Tradeoffs:

Slower cold start: ~2-6s stop/start. 4-12x slower than Sprites
warm resume. Still within 10s P0.
Requires custom infrastructure: 3-layer idle detection, Dockerfile
with 4-layer defense-in-depth, disk pruning cron, Tigris backup,
health endpoint, session management -- ~60-70% more custom code.
No official Python SDK: Abandoned community and official SDKs.
Must use httpx + flyctl subprocess.
REST API constraints: Exec has 60s hard timeout. Rate limits at
1 req/s sustained (burst to 3).

Decision Framework


Factor
Sprites
Machines
Winner


Cold start
~100-500ms
~2-6s
Sprites


Cost (5 inst, 2hr/day)
~$33-62/mo
~$33/mo
Comparable


Cost (5 inst, 8hr/day)
~$133-247/mo
~$112/mo
Machines


DX / time-to-prototype
No Dockerfile, native idle
Dockerfile + 3-layer idle
Sprites


Storage
100 GB included
5 GB volume ($0.15/GB)
Sprites


SSH
WebSocket exec + FS API
Native SSH + SCP
Machines


Maturity
Jan 2026, no SLA
Years of production use
Machines


Region control
Proximity-routed only
Explicit region selection
Machines


Idle management
Native 30s auto-sleep
DIY 3-layer system
Sprites


Phase 2-3 cost
Exceeds $100/mo P1
Exceeds $100/mo P1
Neither


Recommendation: Start with Sprites for Phase 1 prototyping (faster
iteration, less code, comparable Phase 1 cost). Validate cold start, pricing,
and reliability. Neither platform meets the $100/mo P1 threshold at Phase 2-3
(8hr/day) with corrected pricing — Machines ~$112/mo, Sprites ~$133-247/mo —
so the P1 cost target should be revisited or Phase 2-3 usage assumptions
reduced. If Sprites proves stable, continue with it. If reliability concerns
emerge, migrate to Machines (the bot-to-instance interface is
platform-agnostic).
If Both Fly.io Options Don't Work Out

Fallback order (meet tightened P0s or borderline):

Northflank (borderline) -- Per-second billing, ~$44/mo (Phase 1) /
~$164/mo (Phase 2-3 -- exceeds $100 threshold), indefinite process lifetime,
GA disks, port forwarding. Borderline on SSH P0: northflank exec +
northflank forward provide shell and port forwarding but lack SCP/SFTP.
Cold start ~5-15s estimated (upper end exceeds 10s P0). No native auto-sleep
(must build idle detection). Supports Phase 1-3.
RunPod -- Real SSH, GA volumes, indefinite lifetime, 3-5s cold start.
But GPU-focused pricing makes CPU workloads expensive ($91/mo Phase 1,
~$353/mo Phase 2-3). Fails P1 Phase 2-3 cost threshold. CPU pricing
unverified (only GPU pricing publicly listed).

Vendor concentration risk: Both recommended options are Fly.io products.
Mitigations: (a) the bot-to-instance interface (exec API or SSH + CLI) is
abstracted behind the bot's orchestration layer, (b) no Fly.io-specific APIs
are used in the instance runtime itself, (c) Claude Code runs identically on
any Linux environment, (d) Sprites' exec API is REST-based and could be
replicated on any platform with WebSocket exec support.
9. Prototype Plan (This Week)

Two parallel tracks. Run both simultaneously; compare results after Day 2.
Track A: Sprites Validation

Day 1: Sprites Setup + Claude Code

# Install Sprites CLI
curl -fsSL https://sprites.dev/install.sh | bash
sprite login

# Create first sprite
sprite create -o <org> claude-worker-1

# Connect and verify environment
sprite console -s claude-worker-1
# Inside sprite: verify pre-installed tools
node --version    # Node.js
python3 --version # Python
claude -v         # Claude Code (pre-installed)

# Configure z.ai GLM environment (persists across sleep/wake)
cat >> ~/.bashrc << 'EOF'
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
export API_TIMEOUT_MS="3000000"
export NODE_OPTIONS="--max-old-space-size=4096"
export MALLOC_ARENA_MAX=2
export DISABLE_AUTOUPDATER=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
EOF
# ANTHROPIC_AUTH_TOKEN set separately (not in bashrc)

# Test Claude Code execution via API exec dispatch
sprite exec -s claude-worker-1 -- claude -p "Hello, what tools are available?" \
  --output-format stream-json \
  --dangerously-skip-permissions

# Create checkpoint of configured environment
sprite checkpoint -s claude-worker-1 --comment "base-config-v1"
Validate:

Sprite creation time (target: < 2s)
Claude Code runs with z.ai GLM-4.7
--output-format stream-json produces parseable NDJSON
Environment survives sleep/wake cycle (wait 30s, then exec again)
Checkpoint creation and restore works
Measure warm resume time (target: < 500ms)
Measure cold start time (target: < 2s)

Day 2: Bot Integration + API Exec Dispatch

# pip install sprites-py
# Bot dispatches work via Sprites Python SDK
import os
from sprites import SpritesClient

client = SpritesClient(token=os.environ["SPRITE_TOKEN"])
sprite = client.sprite("claude-worker-1")

# Execute Claude Code via API (auto-wakes sleeping sprite)
# Note: Sprites SDK uses run()/command(), not exec().
# Exact method signature may differ from below — verify against
# sprites-py docs at the time of implementation.
result = sprite.run(
    command=["claude", "-p", prompt,
             "--output-format", "stream-json",
             "--dangerously-skip-permissions"],
    env={"ANTHROPIC_AUTH_TOKEN": os.environ["ZAI_API_KEY"]},
)

# Stream response back to Discord
for line in result.stdout:
    # Parse stream-json, extract text_delta events
    pass
Validate:

API exec dispatch wakes sleeping sprites automatically
Streaming output reaches bot in real-time
Session persistence: exec session stays alive during Claude Code execution
Auto-sleep: sprite sleeps 30s after exec session ends
Multiple sprites can be created and managed via SDK
Cost tracking: monitor actual usage vs Adventurer tier allowance

Day 3-5: Parallel with Machines Track


Run identical workloads on both Sprites and Machines
Compare: cold start latency, cost, reliability, DX
Document findings, decide on primary platform


Track B: Machines Validation

Day 1: Platform Validation (2 Fly Apps)

# Install Fly CLI
brew install flyctl
fly auth login

# Create bot app (always-on, cheap)
fly launch --name discord-bot --region iad --no-deploy
# No volumes, no [[services]] -- pure process

# Create worker app (scale-to-zero)
fly launch --name claude-workers --region iad --no-deploy

# Create a persistent volume for first worker
fly volumes create workspace --size 5 --region iad -a claude-workers

# Deploy worker with 8 GB RAM config + defense-in-depth Dockerfile
fly deploy -a claude-workers

# Set z.ai API key as a secret (not in the image)
fly secrets set ANTHROPIC_AUTH_TOKEN=<zai-api-key> -a claude-workers

# Test SSH
fly ssh console -a claude-workers

# Test stop/start (not suspend -- 8 GB exceeds 2 GB limit)
fly machine stop <machine-id> -a claude-workers
fly machine start <machine-id> -a claude-workers
# Measure cold start time (target: 2-6s)
Validate:

Separate apps communicate via .internal DNS
Worker health endpoint (port 8080) returns 200/503
fly.toml config: kill_signal = "SIGTERM", kill_timeout = 30
Worker auto-stop: auto_stop_machines = "stop" works with health endpoint
Bot auto-stop: auto_stop_machines = "off" keeps bot alive

Day 2: Claude Code + z.ai GLM Integration


Verify Claude Code runs in 8 GB RAM container with z.ai GLM-4.7
Test claude -p with --output-format stream-json (confirmed working
locally, validate in container)
Verify persistent disk survives stop/start cycles
Measure actual cold start times with z.ai API connection
Test long prompt handling (verify 50-min API timeout works in container)
Verify NODE_OPTIONS="--max-old-space-size=4096" prevents OOM during
heavy subprocess workloads (run git clone, npm install, pytest)
Verify defense-in-depth auto-updater: confirm binary is read-only,
claude doctor shows auto-updates disabled, no update HTTP requests
on startup (check network traffic)
Confirm GLM-4.5-Air works reliably in haiku slot (background tasks,
subagent operations)

Day 3: Discord Bot Skeleton + Idle Detection


Deploy bot to discord-bot app (shared-cpu-1x/256 MB, always-on)
Bot maps channels to worker Machine IDs
On message: wake worker via Machines API, pipe prompt, stream response
Implement 3-layer idle detection:

Layer 1: HTTP health check every 5 min (process-based)
Layer 2: 4-hour hard cap (configurable per-instance)
Layer 3: Fly.io auto-stop as failsafe


Test Discord notifications on session stop

Day 4-5: Session Management + CLI Tool


Implement session ID mapping (UUID5 per Discord channel)
Deploy disk pruning cron (50/70/85/95% thresholds)
Set up Tigris bucket for session backups
Build cbot CLI for instance management
Test with 2-3 concurrent workers
Validate rolling update workflow (rebuild image with new
CLAUDE_CODE_VERSION, deploy to one machine, verify, roll to rest)
Test volume threshold notifications via Discord
Document findings, decide on next steps

10. Resolved Questions

All questions from previous iterations have been resolved through research
(Exa web search, Context7, GitHub issue analysis) and documented below.
Q1: RAM Sizing

Answer: 8 GB fixed (performance-1x).
Research found Claude Code idles at 400-700 MB but grows to 1-4 GB during
active prompts. Critically, all subprocess stdout/stderr is retained
permanently in the JavaScript heap (GitHub #11155, closed NOT_PLANNED).
Running git, npm install, pytest causes linear memory growth. Single
sessions observed at 3.3 GB. With subprocesses needing additional memory,
4 GB causes frequent OOMs. 8 GB provides safety margin. z.ai GLM does not
affect memory (purely network-level redirect). Anthropic's own devcontainer
uses NODE_OPTIONS=--max-old-space-size=4096.
Q2: Stop/Start Latency

Answer: 2-6 seconds total. Well within 10s P0.
Infrastructure start: 1-3s (VM boot + volume mount + rootfs reset).
Application startup: 1-3s additional. Volume size does NOT affect start time
(NVMe mount is constant-time). Unclean shutdown may add 1-5s for ext4 journal
replay. Image size (~500 MB) adds 100-200ms for containerd snapshot.
Suspend has had severe reliability issues in 2025 (23-50s start times, stuck
machines) -- stop/start is the confirmed reliable path.
Q3: Always-On Bot Architecture

Answer: Separate Fly apps (bot-app + worker-app).
Remove [[services]] section entirely for the bot app (pure process, not
HTTP). Bot runs on shared-cpu-1x/256 MB at ~$2/month ($0.0028/hr × 730hr). Workers in a separate
app with scale-to-zero. Internal .internal DNS for communication. Separate
deploy cycles. This pattern is recommended by the Fly.io community for
Discord bots. Trial accounts have a 5-minute limit (need credit card).
Q4: Conversation History Persistence

Answer: Hybrid -- Claude Code manages sessions on disk with deterministic
UUIDs. Bot handles pruning and backup.
Sessions stored as JSONL under ~/.claude/projects/{project-id}/{session-id}.jsonl.
Use --resume with a deterministic session ID per Discord channel for
consistent mapping.
Disk grows unbounded -- pruning cron at 50% threshold. Old sessions compressed
and backed up to Tigris ($0.02/GB/mo) before deletion. Active sessions
protected (check lsof + always preserve latest by mtime). Project files
never auto-pruned. Volume threshold notifications at 50/70/85/95%.
Q5: In-Flight Operations During Stop

Answer: Process check is primary defense. kill_signal=SIGTERM, kill_timeout=30.
Fly.io sends SIGINT by default (not SIGTERM) with a 5-minute default grace
period. Configurable via kill_signal and kill_timeout in fly.toml (max
300s). Claude Code has NO
robust graceful shutdown handler -- it simply dies on signal. File writes are
NOT atomic (standard fs.writeFile truncates then writes). However, ext4
ordered journaling guarantees either old content or new content, never a mix.
The idle detection process check is the primary defense: if Claude Code isn't
running, there's nothing to corrupt. The 30s timeout provides margin for
filesystem journal flush and child process cleanup.
Q6: Python SDK vs REST API

Answer: httpx async client for REST API + flyctl subprocess for SSH/secrets.
Community Python SDK: abandoned (4 stars, last commit July 2023). Official
fly-openapi-sdk: also abandoned (10 commits, Sept 2023). The REST API is
well-documented and production-grade (used by Supabase). Base URL:
https://api.machines.dev/v1 (public) or http://_api.internal:4280/v1
(internal). Constraints: exec endpoint has 60s hard timeout, rate limits
at 1 req/s sustained (burst to 3). Secrets API unreliable for app secrets --
use fly secrets set CLI instead.
Q7: GLM-4.7-Flash as Haiku Replacement

Answer: Rejected. Keep GLM-4.5-Air for haiku slot.
Three categories of unresolved tool calling bugs found: (1) XML tag
corruption with duplicated/garbled <tool_call> tags (SGLang #15721),
(2) XML syntax mismatches causing context loss (Ollama #13820), (3) object
parameter serialization failures outputting JSON strings instead of objects
(HuggingFace #18). No Z.AI-side fixes published as of February 2026. Z.AI's
own Claude Code mapping still recommends GLM-4.5-Air for haiku. GLM-4.5-Air
has 12B active parameters (4x Flash's 3B), 0.940 Tool Selection Quality, and
months of proven stability. Re-evaluate when Z.AI publishes fixes.
Q8: DISABLE_AUTOUPDATER Reliability

Answer: Unreliable as sole mechanism. 4-layer defense-in-depth required.
Documented failure history: Issue #2898 (config "clobbered"), Issue #9327
(auto-update deleted the binary), Issue #14985 (OPEN, Dec 2025, env var not
honored). CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 implicitly sets
DISABLE_AUTOUPDATER=1 and also disables telemetry. Defense layers: (1) Real
shell ENV vars in Dockerfile (most reliable), (2) Pin exact version at build
time, (3) Make binary read-only (chmod 555), (4) Ephemeral rootfs. The
version check HTTP request may still fire (~1s latency), but actual binary
replacement is blocked by filesystem permissions.
Q9: Idle Detection Thresholds

Answer: 5-min bot-absent timer, 4-hour hard cap (configurable), Fly.io
auto-stop as failsafe.
Fly.io auto-stop is connection-based, not timer-based -- no configurable idle
timeout. Proxy checks every ~5 min for zero connections. For the Discord bot's
persistent WebSocket (not routed through proxy), auto-stop is blind to
activity. Workers expose HTTP health endpoint for proxy visibility.
Industry benchmarks: GitHub Codespaces defaults to 30 min (max 4 hr), Gitpod
defaults to 30 min (5 min after IDE disconnect). The 3-layer system is
validated: Layer 1 (health check, 5 min) matches Gitpod's IDE-disconnect
pattern. Layer 2 (4 hr cap) matches Codespaces' maximum. Layer 3 (auto-stop)
is a last-resort failsafe. Bot app uses auto_stop_machines = "off" (no
proxy-routed services). Worker app uses auto_stop_machines = "stop" (health
endpoint makes activity visible to proxy).
Requirement	Detail
GA-quality persistent disk	Files must survive across compute restarts. Storage must be Generally Available (not beta/preview) with documented data durability guarantees (no known data loss bugs, snapshot/backup capability).
SSH-compatible remote access	Real SSH or equivalent supporting: interactive shell, file transfer (SCP/SFTP), port forwarding, and key-based authentication. Exec-only shells and proprietary terminals without file transfer do not qualify.
Cold start < 10s	Resume from idle/hibernation must complete within 10 seconds.
Full internet	Instances need unrestricted outbound access (git, npm, pip, APIs).
Zero idle compute cost	Compute billing must stop completely when instances are stopped/suspended. Storage-only charges are acceptable.
No hard boot timeout < 30s	Platform must not impose a hard boot timeout under 30 seconds that would silently reject slow-starting containers with errors (e.g., 502s).
Resource	Minimum	Recommended
CPU	2 vCPU	4 vCPU
RAM	8 GB	8 GB
Disk (base install)	~120 MB	~500 MB (with tools)
Disk (working directory)	Variable	5 GB
Runtime	None (native binary bundles Bun)	git, bash, curl, jq
Network	api.z.ai (z.ai GLM API endpoint)	Full internet
Auth	`ANTHROPIC_AUTH_TOKEN` + `ANTHROPIC_BASE_URL` env vars (z.ai)	Same
Claude Code Slot	GLM Model	Role
Opus	`glm-4.7`	Complex reasoning, planning
Sonnet	`glm-4.7`	Main model (balanced)
Haiku	`glm-4.5-air`	Background tasks (summaries, quick searches)
Volume Usage	Action
< 50%	No action
>= 50%	Compress + upload old sessions (> 3 days) to Tigris, then delete
>= 70%	Discord warning: "Workspace using 70% storage. Consider cleanup."
>= 85%	Aggressive pruning: backup + delete all sessions except latest
>= 95%	Critical Discord alert: "Storage nearly full. Claude may fail."
Platform	DX	Cold Start	Cost (5 inst, 2hr/day)	Persistence	SSH	Verdict
Fly.io Machines	Good	~2-6s stop/start	~$33/mo (workers + bot)	Volumes ($0.15/GB/mo)	Built-in	CO-RECOMMENDED
Fly.io Sprites	Excellent	~100-500ms warm / 1-2s cold	~$20/mo (tier) + usage	100 GB persistent (Tigris-backed, $0.02/GB/mo)	WebSocket exec + FS API	CO-RECOMMENDED
E2B	Excellent	~1s resume	~$10-15/mo + $150 Pro	Pause/resume (beta)	`e2b sandbox connect`	Eliminated (beta storage, no real SSH)
Modal	Excellent	~1-3s	~$0-23/mo (w/ $30 credits)	Volumes (included)	`modal shell`	Eliminated (no real SSH)
Beam Cloud	Good	~1-3s (est.)	~$138/mo	Volumes (included), snapshots	`beam shell`	Eliminated (no real SSH)
Hetzner	Poor (DIY)	~10-30s (snapshot)	~$22/mo (always-on)	Native disk	SSH native	Eliminated (cold start + idle billing)
GCP Cloud Run	Moderate	~1-5s	~$15-25/mo	GCS/NFS mounts (newer), no SSH	No	Eliminated (no remote shell)
AWS Fargate	Poor	~15-45s (~50% less w/SOCI)	~$25-40/mo	EFS ($0.30/GB/mo)	ECS Exec	Eliminated (cold start > 10s)
RunPod	Moderate	~3-5s (FlashBoot)	~$86/mo (CPU, unverified)	Volumes ($0.07-0.20/GB/mo)	Yes	Overpriced for CPU
Fal.ai	Good (for AI)	N/A	Unknown (CPU)	`/data` vol + KV Store	No SSH	Eliminated (no remote shell)
Koyeb	Good	~200ms Light Sleep	~$46/mo (Pro req.)	Volumes (preview, standard instances only)	`koyeb instances exec`	Eliminated (preview storage, no real SSH)
Northflank	Good	~5-15s pause/resume (est.)	~$24/mo	Disks ($0.15/GB/mo)	`northflank exec`	Borderline (cold start ~5-15s est.)
Railway	Excellent	2-10s App Sleep	~$37/mo (Pro req.)	Volumes ($0.15/GB/mo)	`railway ssh`	Eliminated (boot timeout < 30s)
Render	Excellent	~15-60s suspend/resume (est.)	~$42/mo	Disks ($0.25/GB/mo)	SSH native	Eliminated (cold start > 10s)
Azure Container Apps	Moderate	15-37s+ scale-to-zero	~$53/mo	Azure Files (~$0.03/GiB/mo Hot)	`az containerapp exec`	Eliminated (cold start + idle cost)
Platform	8 GB RAM	Persistent Disk	SSH	Cold Start (w/ 8 GB)	Boot Timeout Risk	Compatibility
Fly.io Machines	Yes (performance-1x, `--vm-memory 8192`)	Volumes	Built-in	~2-6s (stop/start only; suspend limited to ≤2 GB RAM)	None	Compatible
Fly.io Sprites	Yes (elastic max 8 vCPU / 8-16 GB RAM per sprite)	100 GB persistent (Tigris-backed)	WebSocket exec + FS REST API (no native SSH)	~100-500ms warm / 1-2s cold	None	Compatible (SSH P0 satisfied by FS API intent)
E2B	Pro only ($150/mo base)	Pause/resume (beta, bug #884)	`e2b sandbox connect` (no SCP)	~1s resume	None	Incompatible -- beta storage (GA P0), no real SSH (SSH P0), 30-day lifetime (P1), 8 GB requires Pro
Modal	Yes (configurable)	Volumes (included)	`modal shell` (no SCP)	~1-3s	24hr sandbox lifetime	Incompatible -- no real SSH (SSH P0), 24hr lifetime (P1)
Beam Cloud	Yes (configurable)	Volumes (included)	`beam shell` (no SCP)	~1-3s (est.)	None	Incompatible -- no real SSH (SSH P0)
Hetzner	Yes (CX23 native)	Native disk	SSH native	5-15+ min (snapshot restore)	N/A	Incompatible -- cold start > 10s, idle billing never stops
GCP Cloud Run	Yes	GCS/NFS volume mounts (newer feature)	No SSH	~1-5s	N/A	Incompatible -- no remote shell (P0 requirement)
AWS Fargate	Yes	EFS ($0.30/GB/mo)	ECS Exec	~15-45s (~8-23s w/SOCI)	None	Incompatible -- cold start > 10s even with SOCI
RunPod	Yes	Volumes	Yes	~3-5s (FlashBoot)	None	Compatible -- but ~$86/mo (unverified), GPU-focused pricing
Fal.ai	Unknown	`/data` distributed vol + KV Store	No SSH	N/A	N/A	Incompatible -- no remote shell, AI-inference focused
Koyeb	Eco (2vCPU/4GB) cannot attach volumes; Standard instances required	Volumes (preview, standard instances only)	`koyeb instances exec` (no SCP)	~200ms Light Sleep	Eco+volume incompatible	Incompatible -- preview storage (GA P0), no real SSH (SSH P0)
Northflank	Yes (configurable)	Disks ($0.15/GB/mo)	`northflank exec`	~5-15s estimated (pause/resume)	None	Borderline -- upper estimate exceeds 10s P0, no auto-sleep (P1)
Railway	Yes (configurable)	Volumes ($0.15/GB/mo)	`railway ssh`	2-10s	Fixed 10s boot timeout -- violates no-boot-timeout-<30s P0	Incompatible -- boot timeout kills 8 GB containers
Render	Yes (Pro: 2vCPU/4GB)	Disks ($0.25/GB/mo)	SSH native	~15-60s estimated (API suspend/resume)	None	Incompatible -- cold start > 10s P0
Azure Container Apps	Yes (configurable)	Azure Files (~$0.03/GiB/mo Hot)	`az containerapp exec`	15-37s+ (105MB image); likely 30-60s+ for Claude Code (~500MB+)	No hard timeout, but exceeds 10s P0	Incompatible -- cold start > 10s + idle compute charges
Platform	Long-Running Processes	External Port Exposure	In-Place Code Evolution	Phase Support	Status
Fly.io Machines	Indefinite (no lifetime limit)	Automatic Anycast (`<app>.fly.dev` HTTPS) + `fly proxy` for dev	Full SSH + SCP + port forwarding	Phase 1-3	CO-RECOMMENDED
Fly.io Sprites	Indefinite (no lifetime limit)	Public URL (`<name>.sprites.app` HTTPS) + `sprite proxy` for dev	WebSocket exec + FS REST API (no SCP)	Phase 1-3	CO-RECOMMENDED
RunPod	Indefinite (pods persist)	HTTP proxy via RunPod URL	SSH native + SCP	Phase 1-3	Viable (overpriced)
Northflank	Indefinite (services persist)	Automatic public URLs + custom domains	`northflank exec` + port forwarding (no SCP)	Phase 1-3	Borderline (SSH P0)
~~E2B~~	30-day max sandbox lifetime	No native port exposure	`e2b sandbox connect` (no SCP)	Phase 1 only	Eliminated (GA P0 + SSH P0)
~~Modal~~	24hr sandbox / Functions lose state	`modal serve` (dev tunnels)	`modal shell` (no SCP)	Phase 1 only	Eliminated (SSH P0)
~~Beam Cloud~~	Indefinite (persistent containers)	Endpoint URLs for deployed functions	`beam shell` (no SCP)	Phase 1-2	Eliminated (SSH P0)
~~Koyeb~~	Indefinite (services persist)	Automatic `<app>.koyeb.app` HTTPS	`koyeb instances exec` (no SCP)	Phase 1-3	Eliminated (GA P0 + SSH P0)
Config	Hourly	Monthly (24/7)	Monthly (2hr/day)
shared-cpu-1x / 256 MB (bot)	$0.0028	$2.04	$0.17
shared-cpu-4x / 4 GB	$0.032	$23.36	$1.92
performance-1x / 8 GB (workers)	$0.0880	$64.24	$5.28
Resource	Rate	Monthly (730h)
CPU	$0.07/CPU-hour	$51.10/CPU-month
RAM	$0.04375/GB-hour	$31.94/GB-month
Hot Storage (NVMe)	$0.000683/GB-hour	~$0.50/GB-month
Base Storage (Tigris)	$0.000027/GB-hour	~$0.02/GB-month
Tier	Monthly	Max Active Sprites	CPU Hours	RAM Hours	Storage (GB-months)
Recruit (free)	$0	Unknown	Trial $30 credit	--	--
Adventurer	$20	20	450	1,800	50
Veteran	$50	50	800	3,200	100
Hero	$100	100	1,200	4,800	150
Champion	$200	200	1,800	7,200	225