We need a way to give each Discord channel its own persistent "computer" running Claude Code CLI, where:
- The computer survives across sessions (disk persists)
- It wakes up fast when a user sends a message (< 10s, ideally < 5s)
- It costs near-zero when idle
- It scales to 50+ independent instances without orchestration overhead
- A developer can manage instances via CLI (create, SSH, resize, destroy)
| Requirement | Detail |
|---|---|
| GA-quality persistent disk | Files must survive across compute restarts. Storage must be Generally Available (not beta/preview) with documented data durability guarantees (no known data loss bugs, snapshot/backup capability). |
| SSH-compatible remote access | Real SSH or equivalent supporting: interactive shell, file transfer (SCP/SFTP), port forwarding, and key-based authentication. Exec-only shells and proprietary terminals without file transfer do not qualify. |
| Cold start < 10s | Resume from idle/hibernation must complete within 10 seconds. |
| Full internet | Instances need unrestricted outbound access (git, npm, pip, APIs). |
| Zero idle compute cost | Compute billing must stop completely when instances are stopped/suspended. Storage-only charges are acceptable. |
| No hard boot timeout < 30s | Platform must not impose a hard boot timeout under 30 seconds that would silently reject slow-starting containers with errors (e.g., 502s). |
- Long-running process support -- Instances must support indefinite processes (e.g., a Discord bot running inside the instance), not just request-response or time-limited sandboxes
- No platform-imposed lifetime limit -- Platform must not impose a maximum sandbox/container lifetime that forces instance recreation (e.g., 24hr sandbox expiry, 30-day max lifetime)
- Cost (Phase 1) -- < $50/month for 5 workers at ~2-4 hrs active/day each (+ bot)
- Cost (Phase 2-3) -- < $100/month for 5 workers at ~8 hrs active/day each (+ bot)
- Per-second billing -- Pay only for active compute time
- Cold start speed -- Faster is better; < 5s is ideal
- Developer experience -- Python/TS SDK, CLI tools, minimal boilerplate
- Scale-to-zero with safety net -- Platform must support zero compute cost when idle via programmatic stop API. Platform-native auto-stop as a failsafe (preventing orphaned running instances if bot orchestration fails) is preferred but not required
- Keep-warm option -- Platform must support keeping instances warm (e.g.,
min_containers,scaledown_window) for latency-sensitive phases where cold start must be near-zero
- Configurable idle timeout (grace period before hibernate)
- Snapshot/restore of full VM state (memory + disk)
- Central config template with per-instance overrides
- Inter-instance communication (future)
- Growable disk per instance
Based on research of Claude Code's system requirements:
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 2 vCPU | 4 vCPU |
| RAM | 8 GB | 8 GB |
| Disk (base install) | ~120 MB | ~500 MB (with tools) |
| Disk (working directory) | Variable | 5 GB |
| Runtime | None (native binary bundles Bun) | git, bash, curl, jq |
| Network | api.z.ai (z.ai GLM API endpoint) | Full internet |
| Auth | ANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL env vars (z.ai) |
Same |
Why 8 GB RAM: Claude Code's Node.js process idles at 400-700 MB but
grows to 1-4 GB during active prompts. Critically, all subprocess stdout/stderr
is retained permanently in the JavaScript heap (GitHub issue #11155, closed
NOT_PLANNED). Running git, npm install, pytest, or other subprocesses
causes linear memory growth. Single sessions have been observed reaching 3.3 GB.
With subprocesses (Python, TypeScript, Bash) needing additional memory on top,
4 GB causes frequent OOMs under real workloads. 8 GB provides a safety margin.
Recommended env vars for memory management:
# Cap Node.js old-generation heap (Anthropic's own devcontainer uses 4096)
export NODE_OPTIONS="--max-old-space-size=4096"
# Reduce glibc arena count to limit virtual memory fragmentation
export MALLOC_ARENA_MAX=2FROM ubuntu:22.04
# System dependencies
RUN apt-get update && apt-get install -y git bash curl jq && \
rm -rf /var/lib/apt/lists/*
# --- Auto-Update Defense-in-Depth ---
#
# DISABLE_AUTOUPDATER has documented reliability issues:
# - Issue #2898: config key "clobbered under some circumstances"
# - Issue #9327: auto-update deleted the binary mid-session
# - Issue #14985 (OPEN, Dec 2025): env var in settings.json not honored
# We use 4 layers of defense:
# Tier 1: Real shell env vars (most reliable -- not settings.json)
ENV DISABLE_AUTOUPDATER=1
ENV CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
# Tier 2: Pin via stable release channel at build time
# Note: `-s` is documented for channel names (e.g., `stable`), not arbitrary
# version strings. The stable channel lags ~1 week behind latest.
RUN curl -fsSL https://claude.ai/install.sh | bash -s stable
# Tier 3: Make binary read-only (update cannot overwrite)
RUN chmod -R 555 ~/.local/share/claude/versions/ && \
chmod 555 ~/.local/bin/claude
# Tier 4: Ephemeral rootfs (natural Docker defense -- changes lost on restart)
# --- Memory Management ---
ENV NODE_OPTIONS="--max-old-space-size=4096"
ENV MALLOC_ARENA_MAX=2
# --- z.ai GLM Model Provider ---
ENV ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
ENV ANTHROPIC_API_KEY=""
ENV ANTHROPIC_MODEL="glm-4.7"
ENV ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4.7"
ENV ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
ENV ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
ENV API_TIMEOUT_MS="3000000"
# ANTHROPIC_AUTH_TOKEN set via fly secrets (not baked into image)# Headless mode with streaming
claude -p "your prompt" \
--output-format stream-json \
--allowed-tools "Bash,Read,Edit,Write" \
--dangerously-skip-permissionsNote on CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1: This umbrella
variable also disables telemetry (Statsig), error reporting (Sentry), and
the /bug command. With z.ai as provider, Anthropic telemetry has no value.
This reduces outbound traffic, startup latency, and potential failure modes.
Sprites come pre-configured with Ubuntu (docs say 24.04 LTS; community evidence suggests actual image may be 25.04 — verify during prototyping), Claude Code, Node.js, Python, Go, and common dev tools. No Dockerfile is needed. Environment customization is done via exec commands and persists across sleep/wake cycles.
# One-time setup (run inside sprite via exec or console)
# Configure z.ai GLM models
cat >> ~/.bashrc << 'EOF'
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
export API_TIMEOUT_MS="3000000"
export NODE_OPTIONS="--max-old-space-size=4096"
export MALLOC_ARENA_MAX=2
export DISABLE_AUTOUPDATER=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
EOF
# Install additional tools if needed
apt-get update && apt-get install -y jq
# Create checkpoint of configured environment
# (via API: POST /v1/sprites/{name}/checkpoint)Key differences from Dockerfile approach:
- No 4-layer auto-update defense needed -- Claude Code is pre-installed and
the sprite's persistent filesystem means manual version pinning is simpler
(just don't run the updater).
DISABLE_AUTOUPDATER=1in.bashrcis the primary defense. Filesystem is persistent (not ephemeral rootfs), so read-only binary permissions (chmod 555) are still useful as a backup. - Environment variables must be written to
.bashrcor.profileto persist across sleep/wake cycles (env vars set via execenvparam are per-session). - Checkpoints replace container image versioning -- create a checkpoint after configuring the environment, restore to it if needed.
ANTHROPIC_AUTH_TOKENshould be set via the execenvparameter at dispatch time (not baked into.bashrc) for security.
Claude Code supports --output-format stream-json which emits
newline-delimited JSON events. This enables real-time streaming to Discord
by filtering for text_delta events and forwarding them as message edits.
Instances use z.ai GLM models via the Anthropic-compatible API endpoint
(api.z.ai/api/anthropic). Claude Code's interface displays Claude model
names, but GLM models execute behind the scenes. This is transparent to
end users -- Discord responses show no model branding.
| Claude Code Slot | GLM Model | Role |
|---|---|---|
| Opus | glm-4.7 |
Complex reasoning, planning |
| Sonnet | glm-4.7 |
Main model (balanced) |
| Haiku | glm-4.5-air |
Background tasks (summaries, quick searches) |
z.ai dependency: z.ai is the sole model provider. If the z.ai API is
unavailable, instances cannot process prompts. This is an accepted risk --
no fallback provider is configured. The bot-to-instance interface (SSH + CLI)
remains platform-agnostic, so switching providers requires only changing 4
env vars (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_MODEL,
ANTHROPIC_DEFAULT_*_MODEL).
GLM-4.7-Flash evaluation (glm-4.7-flash): Evaluated and rejected for
the Haiku slot. Despite being free with 200K context (30B-A3B MoE), research
found three categories of unresolved tool calling bugs: (1) XML tag corruption
with duplicated/garbled <tool_call> tags (SGLang #15721), (2) XML syntax
mismatches where closing tags don't match opening tags causing context loss
(Ollama #13820), (3) object parameter serialization failures where the model
outputs JSON strings instead of objects (HuggingFace #18). No Z.AI-side fixes
published as of February 2026. Z.AI's own Claude Code model mapping still
recommends GLM-4.5-Air for haiku. GLM-4.5-Air has 12B active parameters (4x
Flash's 3B), a 0.940 Tool Selection Quality score on Galileo's Agent
Leaderboard, and months of proven Claude Code stability. Re-evaluate if Z.AI
publishes tool calling fixes or updates the model mapping.
Claude Code binary versions are pinned in the Dockerfile using the
CLAUDE_CODE_VERSION build arg (see Dockerfile above). Auto-updates are
disabled via a 4-layer defense-in-depth strategy (see Minimal Container Setup).
Update strategy (rolling):
- Build new container image with updated
CLAUDE_CODE_VERSIONarg - Deploy to one instance first, monitor for 24 hours
- If stable, deploy to remaining instances via
fly deploy - At 20+ instances, formalize into canary deployment with automated rollback
The stable release channel (bash -s stable) lags ~1 week behind latest
and is a safer choice for production. Check current version:
claude -v.
Auto-updater reliability note: The DISABLE_AUTOUPDATER=1 env var has a
documented history of being bypassed (see Dockerfile comments for issue
references). The 4-layer defense ensures that even if the env var is ignored,
the read-only binary + ephemeral rootfs prevent any update from persisting.
The version check HTTP request may still fire on startup (adding ~1s latency),
but the actual binary replacement is blocked by filesystem permissions.
Note: This section describes the Fly.io Machines architecture. The Sprites alternative simplifies significantly -- see the "Sprites vs Fly.io Machines: Architecture Simplification" comparison table in Section 5 and the Sprites prototype track in Section 9.
The bot and workers run as separate Fly apps because they have fundamentally different lifecycle needs:
- Bot: Must be always-on (listening to Discord 24/7). Uses 256 MB RAM. Never goes idle. Cost: ~$2/month fixed.
- Workers: Should only run when a user requests Claude Code. Uses 8 GB RAM. Should stop when idle. Cost: $0/month when stopped.
If both ran on the same Machine, the Machine could never stop (bot is always active), paying for 8 GB RAM 24/7 regardless of Claude Code usage.
┌─────────────────────────────┐ ┌──────────────────────────────────┐
│ discord-bot-app │ │ worker-app │
│ (Always-on) │ │ (Scale-to-zero) │
│ │ │ │
│ shared-cpu-1x / 256 MB │────>│ performance-1x / 8 GB │
│ ~$2/month fixed │ │ $0/month when stopped │
│ │ │ ~$5.85/hr when running │
│ Receives Discord messages │ │ Runs Claude Code sessions │
│ Routes to workers │ │ Has persistent volume (5 GB) │
│ No volume needed │ │ Has HTTP health endpoint │
│ No [[services]] section │ │ fly.toml [[services]] for proxy │
└─────────────────────────────┘ └──────────────────────────────────┘
│ ▲
│ Fly internal DNS │
└────> worker-app.internal:8080 ───────────┘
Communication: Fly provides .internal DNS between apps in the same
organization. The bot reaches workers at worker-app.internal:8080 without
exposing anything to the public internet.
Discord message arrives
-> Bot looks up Machine ID for channel
-> If Machine is stopped:
-> Wake via Fly Machines API (POST /start) -- ~2-6s
-> Or: HTTP request to worker-app.internal triggers auto-start
-> Pipe message to `claude -p` with stream-json
-> Stream response back to Discord
-> Bot resets idle timer for this Machine
kill_signal = "SIGTERM"
kill_timeout = 30Rationale: Fly.io sends SIGINT by default with a 5-minute default grace
period (configurable via kill_timeout, max 300s). Claude Code has no
robust graceful shutdown handler -- it simply dies on signal. The 30-second
kill_timeout override provides margin for: ext4 journal flush (~1-5s if
unclean shutdown), child process cleanup (Python/Node/git ~1-2s), and
filesystem sync — without waiting the full 5-minute default. The primary
defense against data loss is the idle detection process check -- if Claude
Code isn't actively running, there's nothing to corrupt. ext4 ordered
journaling guarantees either old content or new content, never a mix.
User notification: When a session is stopped (hard cap, idle timeout,
or deploy), the bot sends a Discord message: "Your Claude Code session
was stopped [reason]. Your work is saved on disk. Use /start to resume."
Idle detection uses a three-layer approach to handle long-running prompts (z.ai GLM API timeout is 50 minutes) without prematurely stopping instances or leaving orphaned instances running indefinitely.
Layer 1: Process-based (primary)
Bot tracks "last prompt dispatched" per Machine.
When no active prompt for 5 min (configurable):
-> HTTP GET worker-app.internal:8080/health
-> If 200 (active): extend timer 5 min, retry
-> If 503 (idle): fly machine stop
-> If unreachable: assume stuck, fly machine stop
Layer 2: Hard cap (safety)
4 hours from session start -> force stop regardless (configurable
per-instance, default 4 hr). Comparable to GitHub Codespaces max (4 hr).
Higher than 50-min API timeout to avoid killing legitimate work.
Layer 3: Fly.io auto-stop (failsafe)
worker-app: auto_stop_machines = "stop" (proxy sees health endpoint)
discord-bot-app: auto_stop_machines = "off" (no proxy-routed services)
Catches cases where the Discord bot itself crashes or loses
track of a Machine. The proxy checks every ~5 min for zero connections.
Health check endpoint (10-line HTTP server inside worker container):
Returns 200 OK when Claude Code process is active, 503 Service Unavailable when idle. This serves double duty: (1) Layer 1 uses it for
process detection instead of SSH/pgrep, (2) Layer 3 uses it for Fly.io
proxy-based auto-stop detection. No additional API tokens needed inside
the worker.
Why the bot manages idle detection (not a container sidecar):
- The bot already knows when prompts are dispatched and completed
- Centralized timers avoid managing 5-50 sidecar processes
- The bot controls
fly machine start/stop-- co-locating the timer eliminates race conditions between detection and lifecycle actions - Fly.io auto-stop provides defense-in-depth without extra code
Key architectural insight: For the Discord bot maintaining a persistent
WebSocket (not routed through Fly Proxy), Fly.io's auto-stop is effectively
blind to the bot's activity state. The bot app MUST use
auto_stop_machines = "off". For workers, the HTTP health endpoint makes
activity visible to the proxy.
- Base template: Shared across all instances (CLAUDE.md, z.ai model
config env vars, base tools,
ANTHROPIC_AUTH_TOKEN) - Per-instance overrides: Channel-specific config (git repos, additional tools, custom CLAUDE.md sections, hard session cap override)
- Config stored in: Git repo or config file, applied at instance creation/boot
- Secrets:
ANTHROPIC_AUTH_TOKEN(z.ai API key) injected via Fly.io secrets (fly secrets set), not baked into the container image
Claude Code stores session history as JSONL files under
~/.claude/projects/{project-id}/{session-id}.jsonl. Key behaviors:
- Session ID mapping: Use
--resume "$session_id"with deterministic UUID5 per Discord channel. This maps each channel to a consistent conversation history.--continueresumes the last session,--resumepicks a specific one. Note: there is no--session-idflag — the session ID is obtained from JSON output (session_idfield) and passed to--resume. - Disk growth: Sessions grow unbounded -- single sessions can reach 3+ GB. Auto-compaction manages the context window only, NOT disk files. Largest offenders: agent progress messages (83-171 MB per line), tool outputs, thinking blocks.
- Persistence: Session files live on the Fly volume and survive stop/start cycles. The volume is the source of truth for conversation history.
Two categories of files on the volume:
- Session files (
~/.claude/projects/*//*.jsonl): Conversation history. Can grow to 3+ GB each. Prunable with backup. - Project files (
/workspace/or similar): Code files Claude creates/edits. Typically small (<100 MB total). Never auto-pruned.
Pruning strategy (cron, every 15 minutes):
| Volume Usage | Action |
|---|---|
| < 50% | No action |
| >= 50% | Compress + upload old sessions (> 3 days) to Tigris, then delete |
| >= 70% | Discord warning: "Workspace using 70% storage. Consider cleanup." |
| >= 85% | Aggressive pruning: backup + delete all sessions except latest |
| >= 95% | Critical Discord alert: "Storage nearly full. Claude may fail." |
Session backup: Old sessions are compressed (gzip) and uploaded to Tigris (Fly.io's S3-compatible object storage, $0.02/GB/month) before deletion. Sessions are recoverable from Tigris if needed.
Active session protection: The pruning cron never deletes the most
recent session per instance (by mtime). It also checks for open file
handles (lsof) before pruning any session -- if Claude Code has the
JSONL file open, that session is skipped entirely.
Project files: Never auto-pruned. If the volume hits 95% even after session pruning, the bot alerts the user via Discord rather than deleting project files. Users manage project file cleanup via Discord commands.
Volume sizing: 5 GB per worker. A single active session rarely exceeds 3.3 GB, and with pruning keeping only the latest session + files under 3 days old, this provides adequate headroom.
| Platform | DX | Cold Start | Cost (5 inst, 2hr/day) | Persistence | SSH | Verdict |
|---|---|---|---|---|---|---|
| Fly.io Machines | Good | ~2-6s stop/start | ~$33/mo (workers + bot) | Volumes ($0.15/GB/mo) | Built-in | CO-RECOMMENDED |
| Fly.io Sprites | Excellent | ~100-500ms warm / 1-2s cold | ~$20/mo (tier) + usage | 100 GB persistent (Tigris-backed, $0.02/GB/mo) | WebSocket exec + FS API | CO-RECOMMENDED |
| E2B | Excellent | ~1s resume | ~$10-15/mo + $150 Pro | Pause/resume (beta) | e2b sandbox connect |
Eliminated (beta storage, no real SSH) |
| Modal | Excellent | ~1-3s | ~$0-23/mo (w/ $30 credits) | Volumes (included) | modal shell |
Eliminated (no real SSH) |
| Beam Cloud | Good | ~1-3s (est.) | ~$138/mo | Volumes (included), snapshots | beam shell |
Eliminated (no real SSH) |
| Hetzner | Poor (DIY) | ~10-30s (snapshot) | ~$22/mo (always-on) | Native disk | SSH native | Eliminated (cold start + idle billing) |
| GCP Cloud Run | Moderate | ~1-5s | ~$15-25/mo | GCS/NFS mounts (newer), no SSH | No | Eliminated (no remote shell) |
| AWS Fargate | Poor | ~15-45s (~50% less w/SOCI) | ~$25-40/mo | EFS ($0.30/GB/mo) | ECS Exec | Eliminated (cold start > 10s) |
| RunPod | Moderate | ~3-5s (FlashBoot) | ~$86/mo (CPU, unverified) | Volumes ($0.07-0.20/GB/mo) | Yes | Overpriced for CPU |
| Fal.ai | Good (for AI) | N/A | Unknown (CPU) | /data vol + KV Store |
No SSH | Eliminated (no remote shell) |
| Koyeb | Good | ~200ms Light Sleep | ~$46/mo (Pro req.) | Volumes (preview, standard instances only) | koyeb instances exec |
Eliminated (preview storage, no real SSH) |
| Northflank | Good | ~5-15s pause/resume (est.) | ~$24/mo | Disks ($0.15/GB/mo) | northflank exec |
Borderline (cold start ~5-15s est.) |
| Railway | Excellent | 2-10s App Sleep | ~$37/mo (Pro req.) | Volumes ($0.15/GB/mo) | railway ssh |
Eliminated (boot timeout < 30s) |
| Render | Excellent | ~15-60s suspend/resume (est.) | ~$42/mo | Disks ($0.25/GB/mo) | SSH native | Eliminated (cold start > 10s) |
| Azure Container Apps | Moderate | 15-37s+ scale-to-zero | ~$53/mo | Azure Files (~$0.03/GiB/mo Hot) | az containerapp exec |
Eliminated (cold start + idle cost) |
Evaluates each platform against the actual CLI requirements from Section 3 and tightened P0/P1 requirements from Section 2 (8 GB RAM minimum, ~120 MB binary, GA-quality persistent disk, SSH-compatible remote access with SCP/port forwarding, full internet, <10s cold start, zero idle compute cost, no hard boot timeout < 30s, no platform-imposed lifetime limit, long-running process support). Many platforms advertise fast cold starts or low prices that do not hold under Claude Code's constraints.
| Platform | 8 GB RAM | Persistent Disk | SSH | Cold Start (w/ 8 GB) | Boot Timeout Risk | Compatibility |
|---|---|---|---|---|---|---|
| Fly.io Machines | Yes (performance-1x, --vm-memory 8192) |
Volumes | Built-in | ~2-6s (stop/start only; suspend limited to ≤2 GB RAM) | None | Compatible |
| Fly.io Sprites | Yes (elastic max 8 vCPU / 8-16 GB RAM per sprite) | 100 GB persistent (Tigris-backed) | WebSocket exec + FS REST API (no native SSH) | ~100-500ms warm / 1-2s cold | None | Compatible (SSH P0 satisfied by FS API intent) |
| E2B | Pro only ($150/mo base) | Pause/resume (beta, bug #884) | e2b sandbox connect (no SCP) |
~1s resume | None | Incompatible -- beta storage (GA P0), no real SSH (SSH P0), 30-day lifetime (P1), 8 GB requires Pro |
| Modal | Yes (configurable) | Volumes (included) | modal shell (no SCP) |
~1-3s | 24hr sandbox lifetime | Incompatible -- no real SSH (SSH P0), 24hr lifetime (P1) |
| Beam Cloud | Yes (configurable) | Volumes (included) | beam shell (no SCP) |
~1-3s (est.) | None | Incompatible -- no real SSH (SSH P0) |
| Hetzner | Yes (CX23 native) | Native disk | SSH native | 5-15+ min (snapshot restore) | N/A | Incompatible -- cold start > 10s, idle billing never stops |
| GCP Cloud Run | Yes | GCS/NFS volume mounts (newer feature) | No SSH | ~1-5s | N/A | Incompatible -- no remote shell (P0 requirement) |
| AWS Fargate | Yes | EFS ($0.30/GB/mo) | ECS Exec | ~15-45s (~8-23s w/SOCI) | None | Incompatible -- cold start > 10s even with SOCI |
| RunPod | Yes | Volumes | Yes | ~3-5s (FlashBoot) | None | Compatible -- but ~$86/mo (unverified), GPU-focused pricing |
| Fal.ai | Unknown | /data distributed vol + KV Store |
No SSH | N/A | N/A | Incompatible -- no remote shell, AI-inference focused |
| Koyeb | Eco (2vCPU/4GB) cannot attach volumes; Standard instances required | Volumes (preview, standard instances only) | koyeb instances exec (no SCP) |
~200ms Light Sleep | Eco+volume incompatible | Incompatible -- preview storage (GA P0), no real SSH (SSH P0) |
| Northflank | Yes (configurable) | Disks ($0.15/GB/mo) | northflank exec |
~5-15s estimated (pause/resume) | None | Borderline -- upper estimate exceeds 10s P0, no auto-sleep (P1) |
| Railway | Yes (configurable) | Volumes ($0.15/GB/mo) | railway ssh |
2-10s | Fixed 10s boot timeout -- violates no-boot-timeout-<30s P0 | Incompatible -- boot timeout kills 8 GB containers |
| Render | Yes (Pro: 2vCPU/4GB) | Disks ($0.25/GB/mo) | SSH native | ~15-60s estimated (API suspend/resume) | None | Incompatible -- cold start > 10s P0 |
| Azure Container Apps | Yes (configurable) | Azure Files (~$0.03/GiB/mo Hot) | az containerapp exec |
15-37s+ (105MB image); likely 30-60s+ for Claude Code (~500MB+) | No hard timeout, but exceeds 10s P0 | Incompatible -- cold start > 10s + idle compute charges |
Key insight: Fly.io Machines' ~300ms suspend headline does not apply to Claude Code. Suspend is limited to machines with ≤2 GB RAM. With the required 8 GB, you must use stop/start (~2-6s cold start). Fly.io Sprites, however, achieve ~100-500ms warm resume at 8 GB because they use a fundamentally different architecture (Firecracker microVM checkpointing, not Docker suspend).
Fully compatible (no caveats): Fly.io Machines (stop/start), Fly.io Sprites
Compatible with trade-offs: RunPod (real SSH, GA volumes, but ~$86/mo unverified CPU pricing, GPU-focused)
Borderline: Northflank (cold start ~5-15s est., upper end exceeds 10s P0;
northflank exec + northflank forward provide shell and port forwarding but
no SCP/SFTP -- borderline SSH P0 compliance)
Incompatible (eliminated by tightened P0s): E2B (beta storage + no real SSH), Modal (no real SSH + 24hr lifetime), Beam Cloud (no real SSH), Koyeb (preview storage + no real SSH), Railway (10s boot timeout), Render (cold start ~15-60s > 10s), Azure Container Apps (cold start 15-37s+ > 10s + idle compute charges), AWS Fargate (cold start ~15-45s > 10s), Hetzner (cold start 10-30s + idle billing never stops), GCP Cloud Run (no remote shell), Fal.ai (no remote shell)
The intended usage model evolves over time:
- Phase 1 (CLI Dispatch): Discord bot sends prompts to Claude Code CLI via
claude -pover remote shell. Instance is a passive compute environment. - Phase 2 (Agent SDK): Codebase migrates to Claude Agent SDK or equivalent. Instance runs a long-lived process (e.g., webhook server or direct Discord bot). Requires the platform to expose instance ports externally.
- Phase 3 (Standalone Framework): Instance runs a mature, self-contained application (web service, Discord bot, or API server) with minimal reliance on the outer Discord bot dispatcher.
This evolution requires: (a) long-running process support (no hard sandbox lifetime limits), (b) ability to expose instance ports to the internet, and (c) in-place code evolution via remote shell without redeployment.
| Platform | Long-Running Processes | External Port Exposure | In-Place Code Evolution | Phase Support | Status |
|---|---|---|---|---|---|
| Fly.io Machines | Indefinite (no lifetime limit) | Automatic Anycast (<app>.fly.dev HTTPS) + fly proxy for dev |
Full SSH + SCP + port forwarding | Phase 1-3 | CO-RECOMMENDED |
| Fly.io Sprites | Indefinite (no lifetime limit) | Public URL (<name>.sprites.app HTTPS) + sprite proxy for dev |
WebSocket exec + FS REST API (no SCP) | Phase 1-3 | CO-RECOMMENDED |
| RunPod | Indefinite (pods persist) | HTTP proxy via RunPod URL | SSH native + SCP | Phase 1-3 | Viable (overpriced) |
| Northflank | Indefinite (services persist) | Automatic public URLs + custom domains | northflank exec + port forwarding (no SCP) |
Phase 1-3 | Borderline (SSH P0) |
| 30-day max sandbox lifetime | No native port exposure | e2b sandbox connect (no SCP) |
Phase 1 only | Eliminated (GA P0 + SSH P0) | |
| 24hr sandbox / Functions lose state | modal serve (dev tunnels) |
modal shell (no SCP) |
Phase 1 only | Eliminated (SSH P0) | |
| Indefinite (persistent containers) | Endpoint URLs for deployed functions | beam shell (no SCP) |
Phase 1-2 | Eliminated (SSH P0) | |
| Indefinite (services persist) | Automatic <app>.koyeb.app HTTPS |
koyeb instances exec (no SCP) |
Phase 1-3 | Eliminated (GA P0 + SSH P0) |
Key insight: After tightening P0s (GA storage + real SSH), Fly.io Machines, Fly.io Sprites, and RunPod support all three evolutionary phases. Fly.io Machines excels with automatic Anycast public HTTPS endpoints, full SSH, and cheapest pricing. Sprites excels with fastest cold start (~100-500ms warm), zero-config idle management, and 100 GB persistent storage. Both Fly.io options have indefinite process lifetime. Northflank is borderline -- it has port forwarding but lacks SCP/SFTP for file transfer.
Migration path options (to be decided during prototyping):
- Webhook dispatch: Discord bot forwards messages via HTTP to a webhook server running inside the instance. Requires the instance to expose an HTTP endpoint (Fly.io: automatic via Anycast; others: platform-specific URL).
- Direct bot: Instance runs its own Discord gateway connection with a dedicated bot token. No HTTP endpoint needed, but requires managing Discord reconnection and token lifecycle per instance.
Both approaches are viable on Fly.io. The choice depends on operational preferences discovered during prototyping.
Why: Best balance of DX, cold start, cost, and persistence for this use case.
Persistent Storage:
- Fly Volumes: local NVMe attached to Machines, $0.15/GB/month
- Persist across stop/start cycles
- Volume snapshots: $0.08/GB/month (first 10 GB free, auto-daily)
- NVMe IOPS: shared-cpu-4x gets 8,000 IOPS / 32 MiB/s
- Volume size does NOT affect start time (NVMe mount is constant-time)
Cold Start:
- Stop/start: ~2-6s (infrastructure 1-3s + application startup 1-3s)
- Unclean shutdown may add 1-5s for ext4 journal replay
- Suspend limitation: machines must have ≤ 2 GB RAM (not applicable for 8 GB)
- Image size: ~500 MB moderate, containerd snapshot 100-200ms
SSH: Built-in via fly ssh console (WireGuard tunnel, no public IP needed)
Pricing (per-second, no minimum):
| Config | Hourly | Monthly (24/7) | Monthly (2hr/day) |
|---|---|---|---|
| shared-cpu-1x / 256 MB (bot) | $0.0028 | $2.04 | $0.17 |
| shared-cpu-4x / 4 GB | $0.032 | $23.36 | $1.92 |
| performance-1x / 8 GB (workers) | $0.0880 | $64.24 | $5.28 |
Idle cost: $0.15/GB/month for rootfs + volumes only. Stopped machines: $0 compute.
5 workers, 2hr/day, performance-1x/8 GB + 5 GB volume each + bot: Compute: 5 x $5.28 = $26.40/mo. Storage: 5 x 5 GB x $0.15 = $3.75/mo. Bot: $2.04/mo. Tigris backup: ~$0.50/mo. Total: ~$33/month
DX:
- CLI:
fly machine create,start,stop,suspend,destroy - REST API: full Machines API with OpenAPI spec
- Config:
fly.toml+ API for programmatic control - Auto-stop/auto-start: proxy-based wake on incoming traffic
Risks:
- Suspend not applicable (8 GB exceeds 2 GB limit; must use stop/start)
- Community Python SDK abandoned (4 stars, last commit July 2023). Official fly-openapi-sdk also abandoned (10 commits, Sept 2023). Use httpx async client for REST API + flyctl subprocess for SSH/secrets.
- REST API exec endpoint has 60s hard timeout
- REST API rate limits: 1 req/s sustained, burst to 3 req/s per action
- Secrets API unreliable for app secrets -- use
fly secrets setCLI instead
Why: Native auto-sleep/wake, 100 GB persistent storage, sub-second warm resume, zero-config idle management. Purpose-built for AI agent sandboxes.
What Sprites Are: Firecracker microVMs (not Docker containers) with persistent filesystems that hibernate when idle and wake automatically on demand. Launched January 2026 by Fly.io. Each sprite gets root access to Ubuntu with common tools pre-installed (Node.js, Python, Go, Ruby, Rust, Elixir, Java, Bun, Deno). Note: docs say Ubuntu 24.04 LTS but community evidence suggests actual image is 25.04 (verify during prototyping). Resource maximums per sprite: 8 vCPU, 8-16 GB RAM (docs inconsistent: homepage says 16 GB, billing example says 8 GB), 100 GB storage. Resources are elastic — you burst up to maximums but pay only for actual cgroup-metered consumption.
Persistent Storage:
- 100 GB persistent root filesystem per sprite (Tigris S3-compatible object storage as durable layer + local NVMe read-through cache)
- Storage persists indefinitely across sleep/wake cycles
- Checkpoints: copy-on-write filesystem snapshots (milliseconds, no process
interruption). API:
POST /v1/sprites/{name}/checkpoint - Installed packages, file changes, and config modifications survive sleep/wake automatically
- Running processes and in-memory state do NOT survive sleep/wake (must restart on wake)
Cold Start:
- Warm resume: ~100-500ms (sprite was recently active, NVMe cache warm)
- Cold start: 1-2s (first access or long-idle sprite)
- New creation: 0.8-2s (verified empirically: 0.8s in user testing)
- Auto-sleep: 30 seconds of inactivity (non-configurable). "Inactivity" =
no active exec sessions, no open TCP connections, no running TTY sessions.
During a
sprite execrunning Claude Code, the sprite stays awake.
SSH Equivalent: No native SSH. Instead provides:
- WebSocket exec (
WSS /v1/sprites/{name}/exec): interactive shell with TTY, stdin/stdout/stderr multiplexing, session persistence across disconnects. Supportsmax_run_after_disconnectfor background processes. - Filesystem REST API: file read/write/list operations via HTTP
sprite proxy: port forwarding from localhost to sprite portssprite console: CLI-based interactive shell (used in user testing)- Public URLs:
https://<name>.sprites.appfor HTTP services - This combination satisfies the SSH P0 intent (interactive shell + file transfer + port forwarding) without literal SSH protocol.
Pricing (usage-based + subscription tiers):
| Resource | Rate | Monthly (730h) |
|---|---|---|
| CPU | $0.07/CPU-hour | $51.10/CPU-month |
| RAM | $0.04375/GB-hour | $31.94/GB-month |
| Hot Storage (NVMe) | $0.000683/GB-hour | ~$0.50/GB-month |
| Base Storage (Tigris) | $0.000027/GB-hour | ~$0.02/GB-month |
Sleeping sprites: $0 compute. Only base storage ($0.02/GB-month) accrues. Minimum billing: 6.25% CPU/second, 250 MB RAM/second when active.
Subscription tiers (prepaid credits toward usage):
| Tier | Monthly | Max Active Sprites | CPU Hours | RAM Hours | Storage (GB-months) |
|---|---|---|---|---|---|
| Recruit (free) | $0 | Unknown | Trial $30 credit | -- | -- |
| Adventurer | $20 | 20 | 450 | 1,800 | 50 |
| Veteran | $50 | 50 | 800 | 3,200 | 100 |
| Hero | $100 | 100 | 1,200 | 4,800 | 150 |
| Champion | $200 | 200 | 1,800 | 7,200 | 225 |
5 sprites, 2hr/day, Claude Code workload — two estimates:
Conservative estimate (avg 2 CPUs, 1.5 GB RAM active — spec assumption):
CPU: 5 x 2hr x 2 CPU x 30 = 600 CPU-hrs x $0.07 = $42/mo.
RAM: 5 x 2hr x 1.5 GB x 30 = 450 GB-hrs x $0.04375 = $19.69/mo.
Storage: 5 x 5 GB used x $0.02 = $0.50/mo (base only; storage is billed
on used blocks, not allocated 100 GB. Hot NVMe cache is $0.50/GB-mo but
only applies while sprite is active).
Subtotal: **$62/mo** usage.
sprites.dev documented example (4-hour Claude Code session = $0.44,
implying avg ~0.6 CPU, ~1.5 GB RAM from cgroup metering):
CPU: 5 x 2hr x 0.6 CPU x 30 = 180 CPU-hrs x $0.07 = $12.60/mo.
RAM: 5 x 2hr x 1.5 GB x 30 = 450 GB-hrs x $0.04375 = $19.69/mo.
Storage: $0.50/mo.
Subtotal: **$33/mo** usage.
Important: Sprites pricing is usage-metered from cgroup counters
(cpu.stat usage_usec for CPU, memory.current sampling for RAM), not
fixed-rate per hour. The conservative estimate may be 2x actual costs.
Prototype validation should measure real cgroup consumption to refine.
The Adventurer tier ($20/mo) acts as prepaid credits — overage is billed
at the same per-unit rates.
DX:
- CLI:
sprite create,sprite console,sprite exec,sprite proxy,sprite checkpoint - REST API: full lifecycle at
https://api.sprites.dev/v1/ - SDKs: Python (
sprites-py, import asfrom sprites import SpritesClient), TypeScript (@fly/sprites), Go (github.com/superfly/sprites-go), Elixir (github: superfly/sprites-ex) - Pre-installed: Ubuntu (docs say 24.04; see version caveat above), Claude Code, Node.js, Python, Go, git
- No Dockerfile needed (environment customization via exec commands)
Risks:
- No region selection: Proximity-routed only. Cannot specify region at creation time. Fly.io staff confirmed "not currently planned."
- No SLA: No uptime guarantees documented. Storage stack acknowledged as "in flux" by Fly.io blog.
- No custom base images: Must configure environment post-creation via exec commands (but changes persist across sleep/wake).
- 30s non-configurable auto-sleep: Cannot extend the idle timeout. Active exec sessions keep sprites awake, but between Discord messages the sprite will sleep after 30s. Warm resume (~100-500ms) mitigates this for chat-based workflows.
- Maturity: Launched January 2026. Early-stage product. Fly.io's own blog recommends containerizing for production scale-out.
- SDK gaps: JS SDK missing some helper methods (e.g., URL settings update). Python/Go SDKs more complete.
- No internal DNS: Sprites are isolated. No
.internalDNS between sprites. Communication via Sprites REST API or public URLs. - Non-configurable resource maximums: 8 vCPU, 8-16 GB RAM, 100 GB storage. Cannot configure smaller (or larger) allocations. You pay for actual cgroup-metered use, not the maximums — but cannot right-size for lighter workloads that need less than 8 vCPU.
Sprites eliminate several complex subsystems required by raw Fly.io Machines:
| Capability | Fly.io Machines (DIY) | Sprites (Native) |
|---|---|---|
| Idle detection | 3-layer hybrid system (process check + hard cap + auto-stop) | Built-in: 30s auto-sleep based on exec session activity |
| Wake on demand | Bot calls Machines API POST /start (~2-6s) |
Built-in: any API call wakes sprite (~100-500ms) |
| Disk management | 5 GB volume + pruning cron + Tigris backup + threshold alerts | Built-in: 100 GB persistent filesystem, no pruning needed |
| Container setup | Dockerfile with 4-layer auto-update defense-in-depth | Built-in: pre-installed Ubuntu + Claude Code |
| Separate apps | Bot app + worker app (different lifecycle needs) | Bot can be a sprite too (or separate Fly app) |
| Health endpoint | Custom 10-line HTTP server for activity detection | Not needed: exec session tracking is native |
| Session backup | Tigris upload cron for old session files | Built-in: all data in Tigris by default |
Net effect: The Sprites prototype track requires ~60-70% less custom infrastructure code compared to the Machines track.
Why: Prebuilt Claude Code template, excellent auto-pause/resume
cycle, purpose-built for AI agent sandboxes.
Eliminated: Beta-only persistence with known data loss bug (#884) violates
GA storage P0. e2b sandbox connect is a proprietary terminal without
SCP/SFTP/port forwarding, violating SSH P0.
Persistent Storage:
- No traditional volumes -- persistence via pause/resume of entire VM state
- Paused sandboxes preserve filesystem + memory + running processes
- Max paused duration: 30 days from creation
- Known bug: file changes may not persist after multiple pause/resume cycles (GitHub #884)
Cold Start:
- New sandbox: ~150ms
- Resume from pause: ~1 second
- Pause time: ~4s per GB RAM
SSH: CLI e2b sandbox connect (interactive terminal, not traditional SSH)
Pricing (per-second):
| Resource | Rate | Hourly |
|---|---|---|
| 1 vCPU | $0.0000140/s | $0.050/hr |
| 2 vCPU | $0.0000280/s | $0.101/hr |
| RAM (per GiB) | $0.0000045/s | $0.016/hr |
Paused time: free during beta
Pro tier required for 24hr sessions: $150/month base
5 instances, 2hr/day, 2vCPU + 2GB each: Compute: 5 x 2hr x $0.133/hr x 30 = $39.90/mo. Plus Pro: $150/mo. Total: ~$190/month (Pro tier makes this expensive for small scale)
Hobby tier (free): limited to 1hr sessions, 20 concurrent, 2 vCPU / 512MB fixed (too little RAM for Claude Code).
DX:
anthropic-claude-codeprebuilt template (ready to use, no custom build needed)- Python + TypeScript SDKs
- Auto-pause/resume cycle built in
Sandbox.connect(id)auto-resumes paused sandboxes
Risks:
- $150/month Pro tier minimum for usable sessions
- 30-day maximum sandbox lifetime (must recreate periodically) -- prevents running indefinite processes like a Discord bot (P1 long-running concern)
- Persistence bug in beta (#884)
- No standalone persistent volumes (if sandbox dies, data is lost)
- Phase 1 only: cannot evolve to Agent SDK or standalone framework in-place
Why: Included storage, excellent Python SDK, memory snapshots (CPU GA).
Eliminated: modal shell is a proprietary terminal without SCP/SFTP/port
forwarding, violating SSH P0. Additionally, 24hr sandbox lifetime violates P1
(no platform-imposed lifetime limit).
Persistent Storage:
- Modal Volumes: included (storage pricing not prominently listed; may incur charges at scale)
- Distributed filesystem, optimized for write-once/read-many
- v2 in beta (unlimited files, hundreds of concurrent writers)
Cold Start:
- Standard: ~1-3 seconds
- Memory snapshots: sub-second (CPU GA, GPU still experimental)
- Keep-warm:
min_containers=Norscaledown_window(up to 20 min)
SSH: modal shell (interactive bash), modal container exec
Pricing (per-second):
| Resource | Rate | Hourly |
|---|---|---|
| 1 CPU core (= 2 vCPU) | $0.0000131/core/s | $0.047/hr |
| RAM (per GiB) | $0.00000222/GiB/s | $0.008/hr |
Note: the 3x multiplier applies to nonpreemptible=True (an opt-in flag
for non-preemptible capacity), not to sandboxes specifically. Sandboxes use
standard per-second rates unless nonpreemptible is opted into.
Free tier: $30/month credits
5 instances, 2hr/day, 1 core + 4GB: Compute: 5 x 2hr x ($0.047 + 4x$0.008) x 30 = $23.10/mo minus $30 credit. Total: ~$0/month (covered by free credits for light usage)
DX:
- Everything defined in Python code (no Dockerfiles, no YAML)
modal run,modal deploy,modal shell- Images built programmatically:
modal.Image.debian_slim().pip_install(...)
Risks:
- Memory snapshots: CPU GA but GPU still experimental; no documented expiry policy
- 24-hour max sandbox lifetime -- prevents running indefinite processes like a Discord bot (P1 long-running concern)
- v2 Volumes still in beta (no zero-data-loss guarantee)
- Phase 1 only: cannot evolve to Agent SDK or standalone framework in-place
Why: Snapshots, free volumes, Python-native, open-source core.
Eliminated: beam shell is an exec-based shell without SCP/SFTP/port
forwarding, violating SSH P0. Also overpriced at ~$138/mo for 5 instances.
Persistent Storage:
- Distributed Volumes: free (included in compute pricing)
- Snapshots: capture filesystem + memory state, restore later
Cold Start: Typically under 1s container start (custom runc runtime; ~1-3s estimated with application initialization)
SSH: beam shell (interactive root shell), beam container exec
Pricing (per-second):
| Resource | Rate | Hourly |
|---|---|---|
| 1 CPU core | $0.0000528/core/s | $0.190/hr |
| RAM (per GB) | $0.0000056/GB/s | $0.020/hr |
5 instances, 2hr/day, 2 cores + 4GB: Compute: 5 x 2hr x ($0.380 + $0.080) x 30 = $138/mo. Total: ~$138/month (more expensive than alternatives)
DX: Python SDK, CLI, hot-reload dev mode, open-source core (beta9)
Risks:
- Higher CPU pricing than Modal/Fly.io
- Volume propagation delay up to 60s
- Platform maturity (newer, occasional performance issues reported)
Why: Cheapest per-month if instances run frequently. No platform abstraction. Eliminated: snapshot restore takes 5-15+ min (exceeds 10s P0), stopping does not stop billing (violates zero idle cost P0).
CX23 Server (2 vCPU, 4GB RAM, 40GB NVMe): ~$4.35/month
(Note: CX22 was renamed to CX23 in Hetzner's current lineup.)
Key limitations:
- Billed monthly regardless of state (no per-second billing)
- Stop/start via API, but stopping does NOT stop billing -- only server deletion stops charges (important: no hibernate-to-save-money option)
- Snapshot restore: 5-15+ minutes (dealbreaker for cold start requirement)
- Full DIY: must build hibernation, orchestration, monitoring yourself
- Volumes: €0.044/GB/month
5 instances always-on: 5 x $4.35 + 5 x 5GB x $0.05 = ~$23/month
Not suitable due to slow snapshot restore and DIY overhead, but establishes the cost floor.
Why: Industry-leading cold starts via Light Sleep (~200ms), per-second billing,
native scale-to-zero. Limited by volumes still in preview and Pro plan requirement.
Eliminated: Volumes in public preview (no GA date) violates GA storage P0.
koyeb instances exec is an exec-based shell without SCP/SFTP, violating SSH P0.
Persistent Storage:
- NVMe SSD volumes: 1-10GB max, standard instances on any paid plan (not eco/free instances). Pro plan: $29/mo base.
- Available in Washington D.C. and Frankfurt only
- Currently in public preview (free during preview, GA pricing TBD)
- Volumes require standard instances (eco instances not supported)
- Single-instance only when using volumes (no horizontal scaling)
Cold Start:
- Light Sleep: ~200ms (memory snapshot/eBPF MicroVM restore)
- Deep Sleep: 1-5s (full container restart)
- Instances transition: idle → Light Sleep → Deep Sleep after configurable timeout
- Light Sleep costs 15% of instance rate at GA (free during preview)
SSH: koyeb instances exec / koyeb services exec (interactive shell, not
traditional SSH). Also koyeb instances cp for file transfer.
Pricing (per-second):
| Config | Hourly | Monthly (24/7) | Monthly (2hr/day) |
|---|---|---|---|
| eco-large / 2vCPU / 4GB | $0.029 | $20.74 | $1.73 |
| standard large / 4vCPU / 4GB | $0.058 | $41.47 | $3.46 |
Pro plan required for volumes: $29/month base.
5 instances, 2hr/day, standard large + 5GB volume each (volumes require standard instances): Compute: 5 x $3.46 = $17.28/mo. Storage: $0 (preview). Pro: $29/mo. Total: ~$46/month (preview) / ~$75/month (GA with Light Sleep charges)
DX:
- CLI:
koyeb services create,pause,resume,exec - REST API: full lifecycle management
- Python + JavaScript SDKs (Sandboxes feature)
- Terraform and Pulumi providers
- Scale-to-zero is HTTP-triggered (requires HTTP endpoint for auto-wake)
Risks:
- Volumes in public preview (no GA date, limited to 2 regions, 10GB max)
- Eco instances (best price/spec match) cannot use volumes
- Pro plan ($29/mo) required just for volume access
- Scale-to-zero triggered by HTTP only (Discord bot needs HTTP proxy layer)
- HTTP/2 connections cannot wake sleeping services (HTTP/1.1 only for wake trigger)
- Light Sleep pricing at GA adds ~$28/mo for 5 idle instances
Why: Per-second billing with pause/resume API, comprehensive shell access,
cheapest viable option at ~$24/mo. Borderline due to: (1) estimated ~5-15s cold
start (upper end exceeds 10s P0), (2) northflank exec + northflank forward
provide shell and port forwarding but lack SCP/SFTP for file transfer (borderline
SSH P0 compliance). Requires building your own idle detection (P1 concern).
Persistent Storage:
- SSD persistent disks: $0.15/GB/month
- Persist across container restarts and redeployments
- Single-instance only when using persistent disks
Cold Start:
- Pause/resume: ~5-15s estimated (full container restart from cached image; no official timing documented)
- No official SLA; "seconds" per Northflank documentation
- First image pull can be 30-60s+ (subsequent pulls cached)
SSH: northflank exec CLI + API endpoint (/v1/api/execute-command).
Web UI shell also available. Port forwarding via northflank forward.
Note: No SCP/SFTP equivalent -- file transfer requires workarounds (e.g.,
tar over stdin/stdout, or git push/pull). Borderline SSH P0 compliance.
Pricing (per-second):
| Resource | Hourly | Monthly (24/7) |
|---|---|---|
| 1 vCPU | $0.017 | $12.00 |
| 1 GB RAM | $0.008 | $6.00 |
| 1 GB SSD | -- | $0.15 |
No platform fee. Free Developer Sandbox for testing.
5 instances, 2hr/day, 2vCPU + 4GB + 5GB disk each: Compute: 5 x 60hr x $0.067/hr = $20.00/mo. Storage: 5 x 5GB x $0.15 = $3.75/mo. Total: ~$24/month
DX:
- CLI:
northflank exec,northflank forward, full CRUD operations - REST API: pause, resume, restart, scale, execute commands
- JavaScript client library
- GitOps and Infrastructure as Code templates
- BYOC option (run on your own AWS/GCP/Azure infrastructure)
Risks:
- No SCP/SFTP file transfer (borderline SSH P0 -- must use workarounds)
- No native scale-to-zero (must build idle detection + pause API calls yourself)
- Cold start not SLA-backed ("seconds" is vague)
- Less well-known platform (smaller community, less ecosystem)
Why: Excellent DX with native SSH, App Sleep for scale-to-zero, familiar Heroku-like workflow. Eliminated: fixed 10-second sleep-wake timeout (distinct from the configurable 300s healthcheck timeout) violates the no-boot-timeout-<30s P0. Services waking from App Sleep that don't start in 10s receive 502 errors.
Persistent Storage:
- Persistent volumes: $0.15/GB/month
- Survive deploys and restarts
- Single-instance only when using volumes
Cold Start:
- App Sleep: 2-10s (full container stop/restart)
- First request during wake returns 502 error (must handle in client)
- Fixed 10-minute inactivity timeout before sleep (not configurable)
- 10-second sleep-wake timeout -- if service doesn't start within 10s of waking from App Sleep, returns 502 (separate from configurable healthcheck timeout which defaults to 300s)
- No memory snapshot -- full cold boot on wake
SSH: Native via railway ssh (direct shell into running service)
Pricing (per-second):
| Resource | Per Minute | Hourly | Monthly (24/7) |
|---|---|---|---|
| 1 vCPU | $0.000463 | $0.028 | $20.28 |
| 1 GB RAM | $0.000231 | $0.014 | $10.12 |
Pro plan: $20/month per workspace (includes $20 credit toward usage, unlimited seats).
5 instances, 2hr/day, 2vCPU + 4GB + 5GB volume each: Compute: 5 x 60hr x $0.111/hr = $33.30/mo. Storage: 5 x 5GB x $0.15 = $3.75/mo. Pro: $20/mo (incl. $20 credit). Total: ~$37/month
DX:
- CLI:
railway up,railway ssh,railway run - Dashboard with real-time logs and metrics
- Git-based deploys or Docker images
- Environment variables and secrets management
Risks:
- 502 errors on first request during App Sleep wake (client must retry)
- 10-second boot timeout: services that take >10s to start receive 502 errors (Claude Code container must be optimized for fast startup)
- 2-10s cold start with fixed 10-minute sleep timeout (not configurable)
- No memory snapshot (all in-memory state lost on sleep)
- Volume + service coupling (single instance only)
Why: Excellent DX with native SSH, persistent disks, comprehensive API. Eliminated: resume from suspend takes ~15-60s estimated, exceeding the 10s P0.
Persistent Storage:
- Persistent disks: $0.25/GB/month, encrypted SSD with daily snapshots
- Disk size can be increased but not decreased
- Single-instance only when using persistent disks
- Disks prevent zero-downtime deploys
Cold Start:
- Suspend/resume: ~15-60s estimated (full container restart, not memory snapshot; timing not officially documented for paid API suspend/resume)
- Suspend via API (
POST /v1/services/{id}/suspend) stops billing - Resume via API (
POST /v1/services/{id}/resume) restarts container - No native scale-to-zero; must build orchestration via API
SSH: Native SSH via ssh SERVICE@ssh.REGION.render.com. Dashboard shell also
available. SSH keys managed per-account.
Pricing (per-second, prorated from monthly rate):
| Instance | Spec | Monthly (24/7) | Hourly |
|---|---|---|---|
| Standard | 1 vCPU / 2GB | $25 | $0.034 |
| Pro | 2 vCPU / 4GB | $85 | $0.118 |
| Pro Plus | 4 vCPU / 8GB | $175 | $0.243 |
No platform fee on Hobby plan. Professional workspace: $19/user/month.
5 instances, 2hr/day, Pro (2vCPU/4GB) + 5GB disk each: Compute: 5 x 60hr x $0.118/hr = $35.42/mo. Storage: 5 x 5GB x $0.25 = $6.25/mo. Total: ~$42/month
DX:
- CLI:
rendercommand (Homebrew installable) - REST API: full lifecycle including suspend/resume
- Git-based auto-deploys or Docker images
- Render MCP server for Claude Code/Cursor integration
Risks:
- 15-60s resume exceeds the <10s P0 requirement
- No native scale-to-zero (must build suspend/resume orchestration)
- Resume is a cold restart (no memory snapshot)
- Persistent disks block horizontal scaling and zero-downtime deploys
Why: True serverless scale-to-zero with Azure Files for persistence. Eliminated: cold starts of 15-37s+ (measured for tiny images) exceed 10s P0, and idle compute charges (~$0.011/vCPU-hr) violate zero idle cost P0.
Persistent Storage:
- Azure Files SMB mount: ~$0.03/GiB/month (Standard Hot LRS; Transaction Optimized tier is ~$0.06/GiB)
- Persists across container restarts and replica replacements
- ReadWrite or ReadOnly mount modes
Cold Start:
- Scale-to-zero resume: 15-37s+ (measured for 105MB image)
- Official Hello World image: 22s cold start (GitHub issue #997, 50 thumbs-up)
- No official SLA for scale-up latency
- Larger images (500MB-1GB for Claude Code) will likely be slower
SSH: az containerapp exec (interactive shell into running container).
Azure Portal console also available. Cannot exec into scaled-to-zero containers.
Pricing (per-second, Consumption plan):
| Resource | Per Second | Hourly |
|---|---|---|
| 1 vCPU (active) | $0.000024 | $0.086 |
| 1 GiB RAM (active) | $0.000003 | $0.011 |
| 1 vCPU (idle) | $0.000003 | $0.011 |
Free tier: 180,000 vCPU-sec + 360,000 GiB-sec/month (~$13 value).
5 instances, 2hr/day, 2vCPU + 4GiB each: Compute: 5 x $12.96 = $64.80/mo. Free tier: -$12.96. Storage: 5 x 5GB x $0.03 = $0.75/mo. Total: ~$53/month
DX:
- CLI:
az containerapp create,exec,update - ARM/Bicep templates, Terraform provider
- KEDA-based autoscaling rules (HTTP, custom metrics)
- Integrated with Azure Container Registry, Log Analytics
Risks:
- Cold start is the critical concern: 15-37s for tiny images, likely 30-60s+ for a real Claude Code container. Exceeds 10s P0 requirement.
- KEDA scaling rules require HTTP or event source (Discord messages need proxy)
- Cannot exec into scaled-to-zero containers (must wait for scale-up)
- Azure ecosystem complexity (VNet, resource groups, subscriptions)
| Platform | Reason | P0 Violated |
|---|---|---|
| E2B | Beta-only persistence with known data loss bug (#884). e2b sandbox connect is a proprietary terminal without SCP/SFTP/port forwarding. Also has 30-day sandbox lifetime (P1 concern). Excellent DX and prebuilt Claude Code template, but fails two P0s. |
GA-quality persistent disk, SSH-compatible remote access |
| Modal | modal shell is a proprietary terminal without SCP/SFTP/port forwarding. 24hr sandbox lifetime limits long-running processes (P1 concern). Excellent Python SDK DX and included v1 GA storage. |
SSH-compatible remote access |
| Beam Cloud | beam shell is an exec-based shell without SCP/SFTP/port forwarding. Also overpriced (~$138/mo for 5 instances). Free volumes and snapshot support, but no real SSH. |
SSH-compatible remote access |
| Koyeb | Volumes in public preview (no GA date, limited to 2 regions, 10GB max). koyeb instances exec lacks SCP/SFTP. Excellent ~200ms cold start and native scale-to-zero, but fails two P0s. |
GA-quality persistent disk, SSH-compatible remote access |
| Railway | Fixed 10-second sleep-wake timeout returns 502 if container doesn't start in time (separate from configurable 300s healthcheck timeout). An 8 GB Claude Code container with ~500 MB image is risky in 10s. Otherwise strong DX with native SSH and App Sleep. | No hard boot timeout < 30s |
| Render | API suspend/resume takes ~15-60s estimated (not officially documented for paid tier). Borderline under old 60s P0, clearly fails under tightened 10s. Good DX and native SSH. | Cold start < 10s |
| Azure Container Apps | 15-37s+ cold start for 105MB images (likely 30-60s+ for Claude Code). Also has idle compute pricing (~$0.011/vCPU-hr) that violates zero idle cost. | Cold start < 10s, Zero idle cost |
| AWS Fargate | 15-45s cold starts (~50% reduction with SOCI lazy loading = ~8-23s, still > 10s). Complex networking, EFS expensive ($0.30/GB/mo). | Cold start < 10s |
| Hetzner | Snapshot restore takes 5-15+ minutes. Stopping servers does NOT stop billing (only deletion stops charges). No per-second billing, no auto-sleep. | Cold start < 10s, Zero idle cost |
| GCP Cloud Run | No interactive shell access (kubectl exec equivalent absent). Now has GCS and NFS volume mounts (newer feature), but no remote shell makes debugging impossible. |
SSH-compatible remote access |
| Fal.ai | No interactive shell access. Has /data distributed volume and KV Store for persistence, but AI-inference focused pricing and no remote shell make it unsuitable. |
SSH-compatible remote access |
| Heroku | No persistent disk (ephemeral filesystem, 24hr dyno cycling). 8 GB RAM requires $500/mo Performance-L dyno ($2,500/mo for 5 instances). | GA-quality persistent disk |
| DigitalOcean App Platform | No persistent disk, no remote shell access. 8 GB RAM tier costs $75/mo per app ($375/mo for 5 instances). | GA-quality persistent disk, SSH-compatible remote access |
| AWS App Runner | No persistent disk (no EFS support despite claims), no remote shell access at all. Purpose-built for stateless HTTP services only. | GA-quality persistent disk, SSH-compatible remote access |
Fly.io Machines costs calculated with performance-1x / 8 GB RAM workers ($0.0880/hr) + shared-cpu-1x / 256 MB bot ($0.0028/hr) + 5 GB volumes ($0.15/GB/mo) + Tigris session backup (~$0.02/GB/mo, estimated ~$0.10/instance/mo).
Fly.io Sprites costs shown as two estimates: conservative (avg 2 CPU, 1.5 GB RAM — spec assumption) and documented (based on sprites.dev's $0.44/4hr Claude Code example, implying avg ~0.6 CPU, ~1.5 GB RAM). Usage-based metering (CPU $0.07/CPU-hr, RAM $0.04375/GB-hr) + base storage ($0.02/GB-mo) + Adventurer tier ($20/mo prepaid credit). Prototype validation should measure actual cgroup consumption to refine estimates.
Platforms marked strikethrough are eliminated by tightened P0 requirements.
| Platform | Compute/mo | Storage/mo | Base Fee | Other | Total/mo |
|---|---|---|---|---|---|
| Fly.io Machines | $26.40 | $3.75 | $0 | Bot $2.04 + Tigris ~$0.50 | ~$33 |
| Fly.io Sprites (conservative) | ~$62 (usage) | ~$0.50 (base) | $20 (Adventurer) | -- | ~$62 (usage exceeds tier) |
| Fly.io Sprites (documented example) | ~$33 (usage) | ~$0.50 (base) | $20 (Adventurer) | -- | ~$33 (within tier) |
| RunPod (CPU, unverified) | ~$86 | ~$5 | $0 | -- | ~$91 |
| Northflank (borderline) | $40.00 | $3.75 | $0 | -- | ~$44 |
| $39.90 | $0 (beta) | $150 | -- | ||
| ~$23 | $0 | $0 | -- | ||
| $138 | $0 | $0 | -- | ||
| $34.56 | $0 (preview) | $29 | -- | ||
| $43.50 | $1.25 | $0 | -- | ||
| $66.60 | $3.75 | $20 (incl. $20 credit) | -- | ||
| $73.13 | $6.25 | $0 | -- | ||
| $103.68 | $0.75 | $0 | -- |
| Platform | Total/mo |
|---|---|
| Fly.io Machines | ~$123 |
| Fly.io Sprites (conservative) | ~$248 (Veteran $50 tier) |
| Fly.io Sprites (documented example) | ~$133 (Veteran $50 tier) |
| RunPod (CPU, unverified) | ~$364 |
| Northflank (borderline) | ~$175 |
| Platform | Total/mo |
|---|---|
| Fly.io Machines | ~$304 |
| Fly.io Sprites (conservative) | ~$620 (Hero $100 tier) |
| Fly.io Sprites (documented example) | ~$333 (Hero $100 tier) |
| RunPod (CPU, unverified) | ~$910 |
| Northflank (borderline) | ~$437 |
Models Phase 2-3 usage where workers run persistent services (webhook servers, Agent SDK processes) for extended hours. P1 threshold: < $100/month.
| Platform | Compute/mo | Storage/mo | Base Fee | Other | Total/mo | Meets P1? |
|---|---|---|---|---|---|---|
| Fly.io Machines | $105.60 | $3.75 | $0 | Bot $2.04 + Tigris ~$0.50 | ~$112 | No |
| Fly.io Sprites (conservative) | ~$247 (usage) | ~$0.50 | $50 (Veteran) | -- | ~$247 | No |
| Fly.io Sprites (documented example) | ~$133 (usage) | ~$0.50 | $50 (Veteran) | -- | ~$133 | No |
| RunPod (CPU, unverified) | ~$348 | ~$5 | $0 | -- | ~$353 | No |
| Northflank (borderline) | $160.00 | $3.75 | $0 | -- | ~$164 | No |
Key insight: At extended usage (8hr/day), no platform meets the $100/mo P1 threshold with the corrected performance-1x pricing ($0.0880/hr). Fly.io Machines at ~$112/mo is closest but exceeds by ~12%. Sprites' usage-based pricing ranges from ~$133/mo (documented example) to ~$247/mo (conservative) depending on actual cgroup consumption. If the $100/mo P1 is firm, options include: (a) downgrading to shared-cpu-4x (lower per-hour rate but shared vCPUs), (b) accepting the overage as Phase 2-3 cost of business, or (c) reducing active hours. RunPod's GPU-focused pricing remains uneconomical for CPU workloads.
# Create a new instance linked to a Discord channel
cbot create --channel <discord-channel-id> --name "project-alpha"
# -> Creates compute instance + persistent volume
# -> Installs Claude Code from base template
# -> Returns instance ID
# List all instances
cbot list
# ID CHANNEL STATUS DISK CREATED
# inst-abc123 #project-alpha running 5GB 2024-01-15
# inst-def456 #bug-fixes suspended 8GB 2024-01-12
# inst-ghi789 #research stopped 5GB 2024-01-10
# SSH into an instance
cbot ssh inst-abc123
# -> Opens interactive shell in the running instance
# Wake a hibernated instance
cbot wake inst-def456
# Stop/hibernate an instance
cbot stop inst-abc123
# Resize disk
cbot resize-disk inst-abc123 --size 10GB
# Destroy an instance (requires confirmation)
cbot destroy inst-ghi789
# WARNING: This will permanently delete instance inst-ghi789
# and all associated data. Type 'yes' to confirm:
# View instance logs
cbot logs inst-abc123 --tail 50# Apply base template to all instances
cbot config apply --template base.yml
# Override config for a specific instance
cbot config set inst-abc123 --env EXTRA_TOOL=enabled
# View instance config
cbot config show inst-abc123Dual recommendation: prototype both Fly.io Sprites and Fly.io Machines in parallel. Let prototype results determine which becomes the primary platform.
Best for: Fastest time-to-prototype, minimal custom infrastructure, best cold start performance, highest DX.
- Sub-second warm resume: ~100-500ms (vs ~2-6s for Machines). Closest to the 5s P1 ideal by an order of magnitude.
- Zero-config idle management: Native 30s auto-sleep replaces the entire 3-layer idle detection system. No health endpoint, no process monitor, no hard cap timer needed.
- 100 GB persistent storage: Eliminates disk pruning cron, Tigris backup, and volume threshold alerts. Session files and project files have ample room without management.
- No Dockerfile: Pre-installed Ubuntu + Claude Code + tools (see version caveat in Section 3). No 4-layer auto-update defense needed.
- API exec dispatch: Bot opens
WSS /v1/sprites/{name}/execto run Claude Code. Native wake-on-demand (sleeping sprites wake automatically on API call). No separate start/stop orchestration. - SDKs: Python (
sprites-py), TypeScript (@fly/sprites), Go, Elixir. More complete than Machines' abandoned SDKs. - Checkpoints: Instant filesystem snapshots for rollback before risky operations. Copy-on-write, milliseconds, no process interruption.
- Empirically validated: User testing confirmed 0.8s sprite creation, successful Claude Code v2.1.29 installation, root console access.
Tradeoffs:
- Higher cost at scale (conservative estimate): ~$62/mo for 5 sprites (Phase 1) vs ~$33/mo Machines. Usage-based pricing becomes expensive at sustained utilization (Phase 2-3: ~$247/mo vs ~$112/mo). Both exceed $100/mo P1 threshold at 8hr/day. Note: sprites.dev documented example implies ~$33/mo (Phase 1) / ~$133/mo (Phase 2-3) if actual cgroup-metered usage matches their benchmarks.
- No region selection: Proximity-routed only. Cannot co-locate with specific infrastructure.
- No SLA: Early-stage product (Jan 2026). Storage stack "in flux."
- 30s non-configurable auto-sleep: Cannot extend idle timeout. Between messages, sprites sleep after 30s. Warm resume mitigates this.
- No native SSH: WebSocket exec + FS API satisfies the P0 intent but is not literal SSH protocol.
- No custom images: Must configure via exec after creation (changes persist, so one-time setup).
Best for: Lower cost at scale, full SSH, production maturity, Phase 2-3 extended usage.
- Cheapest at scale: ~$33/month for 5 workers + bot, ~$112/mo Phase 2-3 (exceeds $100 P1 threshold). ~$304/month for 50 workers.
- Full SSH access:
fly ssh consolewith complete SSH protocol (SCP/SFTP file transfer, port forwarding, key-based auth). - Machines REST API: Full API at
api.machines.dev/v1for programmatic lifecycle control. Production-grade (used by Supabase). - GA-quality persistent volumes: NVMe-backed, $0.15/GB/month with automatic daily snapshots. 8,000 IOPS on shared-cpu-4x.
- Separate apps architecture: Bot app (~$2/mo always-on) + worker app
(scale-to-zero). Internal
.internalDNS for communication. - Evolutionary development: Supports all three phases (CLI dispatch
-> Agent SDK -> standalone). Anycast public HTTPS (
<app>.fly.dev). - Mature platform: Years of production use. Documented APIs. Known failure modes with workarounds.
Tradeoffs:
- Slower cold start: ~2-6s stop/start. 4-12x slower than Sprites warm resume. Still within 10s P0.
- Requires custom infrastructure: 3-layer idle detection, Dockerfile with 4-layer defense-in-depth, disk pruning cron, Tigris backup, health endpoint, session management -- ~60-70% more custom code.
- No official Python SDK: Abandoned community and official SDKs. Must use httpx + flyctl subprocess.
- REST API constraints: Exec has 60s hard timeout. Rate limits at 1 req/s sustained (burst to 3).
| Factor | Sprites | Machines | Winner |
|---|---|---|---|
| Cold start | ~100-500ms | ~2-6s | Sprites |
| Cost (5 inst, 2hr/day) | ~$33-62/mo | ~$33/mo | Comparable |
| Cost (5 inst, 8hr/day) | ~$133-247/mo | ~$112/mo | Machines |
| DX / time-to-prototype | No Dockerfile, native idle | Dockerfile + 3-layer idle | Sprites |
| Storage | 100 GB included | 5 GB volume ($0.15/GB) | Sprites |
| SSH | WebSocket exec + FS API | Native SSH + SCP | Machines |
| Maturity | Jan 2026, no SLA | Years of production use | Machines |
| Region control | Proximity-routed only | Explicit region selection | Machines |
| Idle management | Native 30s auto-sleep | DIY 3-layer system | Sprites |
| Phase 2-3 cost | Exceeds $100/mo P1 | Exceeds $100/mo P1 | Neither |
Recommendation: Start with Sprites for Phase 1 prototyping (faster iteration, less code, comparable Phase 1 cost). Validate cold start, pricing, and reliability. Neither platform meets the $100/mo P1 threshold at Phase 2-3 (8hr/day) with corrected pricing — Machines ~$112/mo, Sprites ~$133-247/mo — so the P1 cost target should be revisited or Phase 2-3 usage assumptions reduced. If Sprites proves stable, continue with it. If reliability concerns emerge, migrate to Machines (the bot-to-instance interface is platform-agnostic).
Fallback order (meet tightened P0s or borderline):
- Northflank (borderline) -- Per-second billing, ~$44/mo (Phase 1) /
~$164/mo (Phase 2-3 -- exceeds $100 threshold), indefinite process lifetime,
GA disks, port forwarding. Borderline on SSH P0:
northflank exec+northflank forwardprovide shell and port forwarding but lack SCP/SFTP. Cold start ~5-15s estimated (upper end exceeds 10s P0). No native auto-sleep (must build idle detection). Supports Phase 1-3. - RunPod -- Real SSH, GA volumes, indefinite lifetime,
3-5s cold start. But GPU-focused pricing makes CPU workloads expensive ($91/mo Phase 1, ~$353/mo Phase 2-3). Fails P1 Phase 2-3 cost threshold. CPU pricing unverified (only GPU pricing publicly listed).
Vendor concentration risk: Both recommended options are Fly.io products. Mitigations: (a) the bot-to-instance interface (exec API or SSH + CLI) is abstracted behind the bot's orchestration layer, (b) no Fly.io-specific APIs are used in the instance runtime itself, (c) Claude Code runs identically on any Linux environment, (d) Sprites' exec API is REST-based and could be replicated on any platform with WebSocket exec support.
Two parallel tracks. Run both simultaneously; compare results after Day 2.
# Install Sprites CLI
curl -fsSL https://sprites.dev/install.sh | bash
sprite login
# Create first sprite
sprite create -o <org> claude-worker-1
# Connect and verify environment
sprite console -s claude-worker-1
# Inside sprite: verify pre-installed tools
node --version # Node.js
python3 --version # Python
claude -v # Claude Code (pre-installed)
# Configure z.ai GLM environment (persists across sleep/wake)
cat >> ~/.bashrc << 'EOF'
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
export API_TIMEOUT_MS="3000000"
export NODE_OPTIONS="--max-old-space-size=4096"
export MALLOC_ARENA_MAX=2
export DISABLE_AUTOUPDATER=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
EOF
# ANTHROPIC_AUTH_TOKEN set separately (not in bashrc)
# Test Claude Code execution via API exec dispatch
sprite exec -s claude-worker-1 -- claude -p "Hello, what tools are available?" \
--output-format stream-json \
--dangerously-skip-permissions
# Create checkpoint of configured environment
sprite checkpoint -s claude-worker-1 --comment "base-config-v1"Validate:
- Sprite creation time (target: < 2s)
- Claude Code runs with z.ai GLM-4.7
--output-format stream-jsonproduces parseable NDJSON- Environment survives sleep/wake cycle (wait 30s, then exec again)
- Checkpoint creation and restore works
- Measure warm resume time (target: < 500ms)
- Measure cold start time (target: < 2s)
# pip install sprites-py
# Bot dispatches work via Sprites Python SDK
import os
from sprites import SpritesClient
client = SpritesClient(token=os.environ["SPRITE_TOKEN"])
sprite = client.sprite("claude-worker-1")
# Execute Claude Code via API (auto-wakes sleeping sprite)
# Note: Sprites SDK uses run()/command(), not exec().
# Exact method signature may differ from below — verify against
# sprites-py docs at the time of implementation.
result = sprite.run(
command=["claude", "-p", prompt,
"--output-format", "stream-json",
"--dangerously-skip-permissions"],
env={"ANTHROPIC_AUTH_TOKEN": os.environ["ZAI_API_KEY"]},
)
# Stream response back to Discord
for line in result.stdout:
# Parse stream-json, extract text_delta events
passValidate:
- API exec dispatch wakes sleeping sprites automatically
- Streaming output reaches bot in real-time
- Session persistence: exec session stays alive during Claude Code execution
- Auto-sleep: sprite sleeps 30s after exec session ends
- Multiple sprites can be created and managed via SDK
- Cost tracking: monitor actual usage vs Adventurer tier allowance
- Run identical workloads on both Sprites and Machines
- Compare: cold start latency, cost, reliability, DX
- Document findings, decide on primary platform
# Install Fly CLI
brew install flyctl
fly auth login
# Create bot app (always-on, cheap)
fly launch --name discord-bot --region iad --no-deploy
# No volumes, no [[services]] -- pure process
# Create worker app (scale-to-zero)
fly launch --name claude-workers --region iad --no-deploy
# Create a persistent volume for first worker
fly volumes create workspace --size 5 --region iad -a claude-workers
# Deploy worker with 8 GB RAM config + defense-in-depth Dockerfile
fly deploy -a claude-workers
# Set z.ai API key as a secret (not in the image)
fly secrets set ANTHROPIC_AUTH_TOKEN=<zai-api-key> -a claude-workers
# Test SSH
fly ssh console -a claude-workers
# Test stop/start (not suspend -- 8 GB exceeds 2 GB limit)
fly machine stop <machine-id> -a claude-workers
fly machine start <machine-id> -a claude-workers
# Measure cold start time (target: 2-6s)Validate:
- Separate apps communicate via
.internalDNS - Worker health endpoint (port 8080) returns 200/503
fly.tomlconfig:kill_signal = "SIGTERM",kill_timeout = 30- Worker auto-stop:
auto_stop_machines = "stop"works with health endpoint - Bot auto-stop:
auto_stop_machines = "off"keeps bot alive
- Verify Claude Code runs in 8 GB RAM container with z.ai GLM-4.7
- Test
claude -pwith--output-format stream-json(confirmed working locally, validate in container) - Verify persistent disk survives stop/start cycles
- Measure actual cold start times with z.ai API connection
- Test long prompt handling (verify 50-min API timeout works in container)
- Verify
NODE_OPTIONS="--max-old-space-size=4096"prevents OOM during heavy subprocess workloads (rungit clone,npm install,pytest) - Verify defense-in-depth auto-updater: confirm binary is read-only,
claude doctorshows auto-updates disabled, no update HTTP requests on startup (check network traffic) - Confirm GLM-4.5-Air works reliably in haiku slot (background tasks, subagent operations)
- Deploy bot to
discord-botapp (shared-cpu-1x/256 MB, always-on) - Bot maps channels to worker Machine IDs
- On message: wake worker via Machines API, pipe prompt, stream response
- Implement 3-layer idle detection:
- Layer 1: HTTP health check every 5 min (process-based)
- Layer 2: 4-hour hard cap (configurable per-instance)
- Layer 3: Fly.io auto-stop as failsafe
- Test Discord notifications on session stop
- Implement session ID mapping (UUID5 per Discord channel)
- Deploy disk pruning cron (50/70/85/95% thresholds)
- Set up Tigris bucket for session backups
- Build
cbotCLI for instance management - Test with 2-3 concurrent workers
- Validate rolling update workflow (rebuild image with new
CLAUDE_CODE_VERSION, deploy to one machine, verify, roll to rest) - Test volume threshold notifications via Discord
- Document findings, decide on next steps
All questions from previous iterations have been resolved through research (Exa web search, Context7, GitHub issue analysis) and documented below.
Answer: 8 GB fixed (performance-1x).
Research found Claude Code idles at 400-700 MB but grows to 1-4 GB during
active prompts. Critically, all subprocess stdout/stderr is retained
permanently in the JavaScript heap (GitHub #11155, closed NOT_PLANNED).
Running git, npm install, pytest causes linear memory growth. Single
sessions observed at 3.3 GB. With subprocesses needing additional memory,
4 GB causes frequent OOMs. 8 GB provides safety margin. z.ai GLM does not
affect memory (purely network-level redirect). Anthropic's own devcontainer
uses NODE_OPTIONS=--max-old-space-size=4096.
Answer: 2-6 seconds total. Well within 10s P0.
Infrastructure start: 1-3s (VM boot + volume mount + rootfs reset). Application startup: 1-3s additional. Volume size does NOT affect start time (NVMe mount is constant-time). Unclean shutdown may add 1-5s for ext4 journal replay. Image size (~500 MB) adds 100-200ms for containerd snapshot. Suspend has had severe reliability issues in 2025 (23-50s start times, stuck machines) -- stop/start is the confirmed reliable path.
Answer: Separate Fly apps (bot-app + worker-app).
Remove [[services]] section entirely for the bot app (pure process, not
HTTP). Bot runs on shared-cpu-1x/256 MB at ~$2/month ($0.0028/hr × 730hr). Workers in a separate
app with scale-to-zero. Internal .internal DNS for communication. Separate
deploy cycles. This pattern is recommended by the Fly.io community for
Discord bots. Trial accounts have a 5-minute limit (need credit card).
Answer: Hybrid -- Claude Code manages sessions on disk with deterministic UUIDs. Bot handles pruning and backup.
Sessions stored as JSONL under ~/.claude/projects/{project-id}/{session-id}.jsonl.
Use --resume with a deterministic session ID per Discord channel for
consistent mapping.
Disk grows unbounded -- pruning cron at 50% threshold. Old sessions compressed
and backed up to Tigris ($0.02/GB/mo) before deletion. Active sessions
protected (check lsof + always preserve latest by mtime). Project files
never auto-pruned. Volume threshold notifications at 50/70/85/95%.
Answer: Process check is primary defense. kill_signal=SIGTERM, kill_timeout=30.
Fly.io sends SIGINT by default (not SIGTERM) with a 5-minute default grace
period. Configurable via kill_signal and kill_timeout in fly.toml (max
300s). Claude Code has NO
robust graceful shutdown handler -- it simply dies on signal. File writes are
NOT atomic (standard fs.writeFile truncates then writes). However, ext4
ordered journaling guarantees either old content or new content, never a mix.
The idle detection process check is the primary defense: if Claude Code isn't
running, there's nothing to corrupt. The 30s timeout provides margin for
filesystem journal flush and child process cleanup.
Answer: httpx async client for REST API + flyctl subprocess for SSH/secrets.
Community Python SDK: abandoned (4 stars, last commit July 2023). Official
fly-openapi-sdk: also abandoned (10 commits, Sept 2023). The REST API is
well-documented and production-grade (used by Supabase). Base URL:
https://api.machines.dev/v1 (public) or http://_api.internal:4280/v1
(internal). Constraints: exec endpoint has 60s hard timeout, rate limits
at 1 req/s sustained (burst to 3). Secrets API unreliable for app secrets --
use fly secrets set CLI instead.
Answer: Rejected. Keep GLM-4.5-Air for haiku slot.
Three categories of unresolved tool calling bugs found: (1) XML tag
corruption with duplicated/garbled <tool_call> tags (SGLang #15721),
(2) XML syntax mismatches causing context loss (Ollama #13820), (3) object
parameter serialization failures outputting JSON strings instead of objects
(HuggingFace #18). No Z.AI-side fixes published as of February 2026. Z.AI's
own Claude Code mapping still recommends GLM-4.5-Air for haiku. GLM-4.5-Air
has 12B active parameters (4x Flash's 3B), 0.940 Tool Selection Quality, and
months of proven stability. Re-evaluate when Z.AI publishes fixes.
Answer: Unreliable as sole mechanism. 4-layer defense-in-depth required.
Documented failure history: Issue #2898 (config "clobbered"), Issue #9327
(auto-update deleted the binary), Issue #14985 (OPEN, Dec 2025, env var not
honored). CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 implicitly sets
DISABLE_AUTOUPDATER=1 and also disables telemetry. Defense layers: (1) Real
shell ENV vars in Dockerfile (most reliable), (2) Pin exact version at build
time, (3) Make binary read-only (chmod 555), (4) Ephemeral rootfs. The
version check HTTP request may still fire (~1s latency), but actual binary
replacement is blocked by filesystem permissions.
Answer: 5-min bot-absent timer, 4-hour hard cap (configurable), Fly.io auto-stop as failsafe.
Fly.io auto-stop is connection-based, not timer-based -- no configurable idle timeout. Proxy checks every ~5 min for zero connections. For the Discord bot's persistent WebSocket (not routed through proxy), auto-stop is blind to activity. Workers expose HTTP health endpoint for proxy visibility.
Industry benchmarks: GitHub Codespaces defaults to 30 min (max 4 hr), Gitpod
defaults to 30 min (5 min after IDE disconnect). The 3-layer system is
validated: Layer 1 (health check, 5 min) matches Gitpod's IDE-disconnect
pattern. Layer 2 (4 hr cap) matches Codespaces' maximum. Layer 3 (auto-stop)
is a last-resort failsafe. Bot app uses auto_stop_machines = "off" (no
proxy-routed services). Worker app uses auto_stop_machines = "stop" (health
endpoint makes activity visible to proxy).