Skip to content

Instantly share code, notes, and snippets.

@drewstone
Created March 6, 2026 07:41
Show Gist options
  • Select an option

  • Save drewstone/152d4845f6ed00ae0c26cac7fa008e26 to your computer and use it in GitHub Desktop.

Select an option

Save drewstone/152d4845f6ed00ae0c26cac7fa008e26 to your computer and use it in GitHub Desktop.
Phony Platform: Voice AI Stack Decision Matrix — Full COGS, TTFB, quality trade-off analysis (March 2026)

Phony Platform Economic Model (Validated)

Date: 2026-03-06 Status: Validated against provider pricing pages and codebase analysis Purpose: Cofounder presentation — COGS, overhead, pricing, team planning


1. VALIDATED COGS — Per Minute of Voice Conversation

1.1 Assumptions (Measured from Codebase)

Parameter Value Source
Turns per minute 5-6 Avg conversational pace with 320-char cap
Input tokens per turn 2,500-3,500 System prompt (~480) + RAG context (~1,500) + history (~1,000) + message (~100)
Output tokens per turn 60-100 Capped at 320 chars / 3 sentences by Twilio config
TTS characters per turn ~200-320 Hard cap: TWILIO_MAX_SPOKEN_CHARS_PER_TURN=320
TTS characters per minute ~1,200-1,600 5-6 turns x 240 avg chars
RAG queries per minute ~2 Not every turn triggers RAG; ~1 in 3 turns
Conversation history window 20 turns Hardcoded in agent-chat.ts
RAG chunks injected 5 agent-chat.ts limit, 1,200 chars each

1.2 Provider Unit Costs (Validated March 2026)

Provider Service Unit Published Rate Source
Twilio Inbound voice per min $0.0085 twilio.com/voice/pricing
Twilio Media Streams (WebSocket) per min $0.0040 twilio.com/voice/pricing
Twilio Phone number per month $1.15 twilio.com/voice/pricing
Twilio Call recording per min $0.0025 twilio.com/voice/pricing
Cartesia Sonic TTS (Startup plan) per 1K chars $0.039 $49/mo for 1.25M credits, 1 credit/char
Cartesia Sonic TTS (Scale plan) per 1K chars $0.037 $299/mo for 8M credits
Cartesia Sonic TTS (Pro plan) per 1K chars $0.050 $5/mo for 100K credits
OpenAI GPT-4o per 1M input tok $2.50 openai.com/api/pricing
OpenAI GPT-4o per 1M output tok $10.00 openai.com/api/pricing
OpenAI GPT-4o-mini per 1M input tok $0.15 openai.com/api/pricing
OpenAI GPT-4o-mini per 1M output tok $0.60 openai.com/api/pricing
OpenAI GPT-5 per 1M input tok $1.25 openai.com/api/pricing
OpenAI GPT-5 per 1M output tok $10.00 openai.com/api/pricing
OpenAI GPT-5-mini per 1M input tok $0.25 openai.com/api/pricing
OpenAI GPT-5-mini per 1M output tok $2.00 openai.com/api/pricing
OpenAI text-embedding-3-small per 1M tok $0.02 openai.com/api/pricing
Anthropic Claude 3.5 Haiku per 1M input tok $0.80 anthropic.com/pricing
Anthropic Claude 3.5 Haiku per 1M output tok $4.00 anthropic.com/pricing
Anthropic Claude Haiku 4.5 per 1M input tok $1.00 anthropic.com/pricing
Anthropic Claude Haiku 4.5 per 1M output tok $5.00 anthropic.com/pricing
ElevenLabs Scribe STT (Pro plan) per hour $0.22-0.40 elevenlabs.io/pricing
Deepgram Nova-3 STT per min $0.0077 deepgram.com/pricing
Deepgram Aura-2 TTS per 1K chars $0.030 deepgram.com/pricing

1.3 Per-Minute COGS Calculation (Current Production Stack)

Your current stack: Twilio + ElevenLabs STT + Cartesia TTS + GPT-5.2 LLM

Component Calculation Cost/Min
Twilio inbound $0.0085 x 1 min $0.0085
Twilio media streams $0.0040 x 1 min $0.0040
Cartesia TTS (Startup plan) 1,400 chars/min x $0.039/1K $0.0546
ElevenLabs STT (Scribe) ~$0.30/hr / 60 $0.0050
GPT-5.2 input 5.5 turns x 3,000 tok x $1.25/1M $0.0206
GPT-5.2 output 5.5 turns x 80 tok x $10.00/1M $0.0044
RAG embeddings 2 queries x ~100 tok x $0.02/1M $0.0000
Total COGS/min $0.097

1.4 Per-Minute COGS by Stack Variant

Stack Telephony STT TTS LLM RAG Total/Min
Current production (GPT-5.2 + Cartesia Startup) $0.0125 $0.0050 $0.0546 $0.0250 $0.0000 $0.097
Current w/ Cartesia Scale (GPT-5.2 + Cartesia Scale) $0.0125 $0.0050 $0.0518 $0.0250 $0.0000 $0.094
Cost-optimized (GPT-4o-mini + Cartesia Startup) $0.0125 $0.0050 $0.0546 $0.0013 $0.0000 $0.073
Budget (GPT-4o-mini + Pocket TTS) $0.0125 $0.0050 $0.0020 $0.0013 $0.0000 $0.021
Deepgram all-in (GPT-4o-mini + Deepgram STT + Aura TTS) $0.0125 $0.0077 $0.0420 $0.0013 $0.0000 $0.064
BYOK (developer brings all keys) $0.0125 $0.0000 $0.0000 $0.0000 $0.0000 $0.013
Fully self-hosted (Whisper + Pocket TTS + open LLM) $0.0125 $0.0020 $0.0020 $0.0030 $0.0000 $0.020

1.5 Per-Call COGS (at various durations)

Using the current production stack at $0.097/min:

Call Duration COGS At $0.15/min price Margin At $0.25/min price Margin
3 min $0.29 $0.45 36% $0.75 61%
5 min $0.49 $0.75 35% $1.25 61%
10 min $0.97 $1.50 35% $2.50 61%
15 min $1.46 $2.25 35% $3.75 61%
30 min $2.91 $4.50 35% $7.50 61%
60 min $5.82 $9.00 35% $15.00 61%

Using the cost-optimized stack at $0.073/min:

Call Duration COGS At $0.15/min price Margin At $0.25/min price Margin
3 min $0.22 $0.45 51% $0.75 71%
5 min $0.37 $0.75 51% $1.25 71%
10 min $0.73 $1.50 51% $2.50 71%
15 min $1.10 $2.25 51% $3.75 71%
30 min $2.19 $4.50 51% $7.50 71%
60 min $4.38 $9.00 51% $15.00 71%

2. FIXED OVERHEAD COSTS (Validated)

2.1 Railway Hosting (Actual Bill)

Plan: Pro ($20/mo base, includes $20 usage credits) Actual monthly cost: ~$28/mo

Resource Usage Monthly Cost
CPU (0.056 vCPU avg across 11 services) 2,417 vCPU-min $1.12
Memory (2.5 GB avg across 11 services) 109,180 GB-min $25.22
Disk (7 GB total) 304,177 GB-min $1.06
Network egress (3.2 GB) 3.2 GB $0.16
Total usage $27.56
Pro plan base included

11 services running:

Service Type Memory Driver
@phony/api Node.js Fastify Moderate
@phony/web Nginx static Low
@persona-labs/voice-api Node.js Hono Moderate
@persona-labs/pocket-tts Python PyTorch High (model weights)
@persona-labs/ingest Node.js Fastify Moderate
@persona-labs/ingest-worker Node.js background Low
developer-portal Static Low
@phony/db PostgreSQL High
@persona-labs/db PostgreSQL High
@persona-labs/vector-db PostgreSQL + pgvector High
@persona-labs/storage-console MinIO Moderate

2.2 Provider Subscription Fees (Monthly Fixed)

Provider Plan Monthly Cost What You Get
Cartesia Startup $49/mo 1.25M chars TTS (~893 agent-minutes)
Cartesia Scale (if needed) $299/mo 8M chars TTS (~5,714 agent-minutes)
ElevenLabs Pro $99/mo (if subscribed) 500K credits TTS + STT
ElevenLabs Free (current) $0 Broken for cloned voices
Twilio Pay-as-you-go $0 base Per-use billing
Twilio Phone numbers (7 creators) $8.05/mo 7 x $1.15
OpenAI Pay-as-you-go $0 base Per-use billing
Stripe Standard 2.9% + $0.30/txn When active
Railway Pro $20/mo base Included in hosting
Total fixed provider costs $77-$327/mo Depends on Cartesia plan

2.3 Total Monthly Overhead (No Revenue)

Category Low (Cartesia Startup) High (Cartesia Scale)
Railway hosting $28 $28
Cartesia subscription $49 $299
Twilio phone numbers $8 $8
Domain/DNS ~$2 ~$2
Claude Code / dev tooling ~$200 ~$200
Total burn ~$287/mo ~$537/mo

2.4 Cartesia Included Minutes Before Overage

Your Cartesia plan includes a block of TTS credits. Here's how many agent-minutes you get before paying overage:

Plan Credits Included Agent-Minutes Included* Effective TTS Cost if Fully Used
Pro ($5/mo) 100K chars ~71 min $0.070/min
Startup ($49/mo) 1.25M chars ~893 min $0.055/min
Scale ($299/mo) 8M chars ~5,714 min $0.052/min

*At 1,400 chars/min average

Implication: On the Startup plan, your first 893 minutes of voice conversation per month have TTS "pre-paid." After that, overage kicks in at $0.039/1K chars.


3. SENSITIVITY ANALYSIS — What Moves the Needle

3.1 COGS Sensitivity (Current Stack: $0.097/min)

If this changes... From To Impact on COGS/min New Total
Switch LLM to GPT-4o-mini GPT-5.2 ($0.025/min) GPT-4o-mini ($0.001/min) -$0.024 $0.073
Switch LLM to GPT-5-mini GPT-5.2 ($0.025/min) GPT-5-mini ($0.003/min) -$0.022 $0.075
Switch STT to Deepgram EL Scribe ($0.005/min) Deepgram Nova-3 ($0.008/min) +$0.003 $0.100
Self-host STT (Whisper) EL Scribe ($0.005/min) Self-hosted ($0.002/min) -$0.003 $0.094
Double turn density (10/min) 5.5 turns/min 10 turns/min +$0.050 $0.147
Halve turn density (3/min) 5.5 turns/min 3 turns/min -$0.025 $0.072
Drop RAG context (no KB) 5 chunks injected 0 chunks -$0.008 (fewer LLM input tokens) $0.089
Cut TTS cap to 160 chars 320 chars/turn 160 chars/turn -$0.027 $0.070
Use Pocket TTS (free tier) Cartesia ($0.055/min) Pocket TTS ($0.002/min) -$0.053 $0.044

The two biggest levers:

  1. TTS provider — Cartesia is 55-60% of your COGS. Pocket TTS eliminates it but has 13-30s latency.
  2. LLM model — GPT-5.2 vs GPT-4o-mini is a 25% COGS swing. Need to test quality impact.

3.2 What Happens as You Scale

Monthly Minutes Twilio Cartesia (Startup) STT LLM (GPT-5.2) Total Variable + Fixed Overhead Cost/Min (All-In)
100 $1.25 pre-paid* $0.50 $2.50 $4.25 $287 $2.91
500 $6.25 pre-paid* $2.50 $12.50 $21.25 $287 $0.62
893 $11.16 pre-paid* $4.47 $22.33 $37.96 $287 $0.36
1,000 $12.50 $4.17 overage $5.00 $25.00 $46.67 $287 $0.33
5,000 $62.50 $160.26 overage $25.00 $125.00 $372.76 $287 $0.13
10,000 $125.00 $335.53 overage** $50.00 $250.00 $760.53 $537** $0.13
50,000 $625.00 $1,582.00 overage $250.00 $1,250.00 $3,707.00 $537 $0.085

*First 893 min included in Startup plan ($49/mo) **Switch to Scale plan ($299/mo) at ~5,700+ min/mo


4. ALL-IN COST PER MINUTE (Overhead Amortized)

This is the number that matters for pricing — COGS + fixed costs spread across actual usage.

Monthly Volume Variable COGS/Min Fixed Overhead/Min All-In Cost/Min Min Price for 50% Margin Min Price for 70% Margin
100 min $0.097 $2.87 $2.97 $5.94 $9.90
500 min $0.097 $0.57 $0.67 $1.34 $2.23
1,000 min $0.097 $0.29 $0.39 $0.78 $1.30
2,500 min $0.097 $0.11 $0.21 $0.42 $0.70
5,000 min $0.097 $0.06 $0.16 $0.32 $0.53
10,000 min $0.097 $0.05 $0.15 $0.30 $0.50
50,000 min $0.085 $0.01 $0.10 $0.20 $0.33

Key insight: You need ~2,500+ min/mo before you can sustainably price at $0.25/min with healthy margins. Below that, fixed costs eat you alive.


5. PRICING RECOMMENDATIONS (Revised)

5.1 Voice API (Developer Platform)

Tier Price Included Min Overage Your All-In COGS Margin on Overage
Free $0 5 min Hard cap $0.49/min (at 5 min vol) N/A (acquisition)
Starter $29/mo 50 min $0.12/min ~$0.10/min (at ~300 min total) 17%
Pro $99/mo 200 min $0.10/min ~$0.10/min ~0% on overage, margin is in subscription
Scale $299/mo 1,000 min $0.08/min ~$0.097/min Negative on overage alone

Problem: At current COGS ($0.097/min), you can't profitably sell overage below ~$0.15/min. The subscription fee IS the margin.

Revised recommendation:

Tier Price Included Min Overage Where Margin Comes From
Free $0 5 min Hard cap Acquisition cost
Starter $49/mo 100 min $0.15/min Sub fee ($49 covers 100 min COGS of $9.70) = 80% margin
Pro $149/mo 500 min $0.12/min Sub fee ($149 covers 500 min COGS of $48.50) = 67% margin
Scale $499/mo 2,500 min $0.10/min Sub fee ($499 covers 2,500 min COGS of $242.50) = 51% margin

5.2 Phony Consumer Product

At current COGS of $0.097/min, the old $2.50/min was actually fine on margin (96%) but terrible for adoption.

Plan Price Minutes COGS Gross Profit Margin
Free trial $0 3 min one-time $0.29 -$0.29 CAC
Casual $4.99/mo 15 min $1.46 $3.53 71%
Regular $14.99/mo 60 min $5.82 $9.17 61%
Power $29.99/mo 150 min $14.55 $15.44 51%
Pay-as-you-go $0.25/min Unlimited $0.097/min $0.153/min 61%

5.3 Creator Revenue Share (If Applicable)

If creators get a cut, it comes out of platform gross profit:

Consumer Plan Revenue Creator Share (50%) Platform After Creator COGS Platform Net Net Margin
Casual ($4.99) $4.99 $2.50 $2.50 $1.46 $1.04 21%
Regular ($14.99) $14.99 $7.50 $7.50 $5.82 $1.68 11%
Power ($29.99) $29.99 $15.00 $15.00 $14.55 $0.45 2%

At 50% creator share, margins are razor-thin. You'd need either:

  • Lower creator share (30-40%), or
  • Higher prices, or
  • Much lower COGS (switch to GPT-4o-mini = instant ~25% COGS reduction)

With GPT-4o-mini (COGS drops to $0.073/min):

Consumer Plan Revenue Creator Share (50%) Platform After Creator COGS Platform Net Net Margin
Casual ($4.99) $4.99 $2.50 $2.50 $1.10 $1.40 28%
Regular ($14.99) $14.99 $7.50 $7.50 $4.38 $3.12 21%
Power ($29.99) $29.99 $15.00 $15.00 $10.95 $4.05 14%

6. DEVELOPMENT & TEAM COSTS

6.1 Current Burn Rate

Item Monthly Cost
Railway hosting $28
Cartesia Startup plan $49
Twilio phone numbers $8
OpenAI API (dev/test) ~$20
Claude Code / AI dev tools ~$200
Domain, DNS, misc ~$5
Total burn (no salaries) ~$310/mo

6.2 Replacement Cost of What's Been Built

Component Person-Months At $15K/mo contractor At $25K/mo senior
Voice pipeline (Twilio + TTS + STT + LLM) 4 $60K $100K
Agent runtime (15 tools, providers, conversation) 3 $45K $75K
RAG pipeline (ingest, hybrid search, pgvector) 2 $30K $50K
Phony API (auth, billing, creators, calls) 3 $45K $75K
Phony Web (consumer + builder portal) 2 $30K $50K
Voice API (developer platform, BYOK, tiers) 2 $30K $50K
Developer portal 1 $15K $25K
DevOps (Railway, Docker, CI/CD) 1 $15K $25K
Voice cloning (7 creators, 3 providers) 1 $15K $25K
Quality/eval (self-play, scorecards) 1.5 $22.5K $37.5K
SDK packages (5 packages) 1.5 $22.5K $37.5K
Total 22 mo $330K $550K

6.3 Team Scaling

Stage Trigger Team Monthly Payroll Total Monthly Burn
Now Pre-revenue 1 founder + AI tooling $0 ~$310
Seed $500K raise 2 founders + 1 eng $25K ~$25.5K
Post-seed $10K MRR 2 founders + 2 eng + 1 design $45K ~$46K
Series A $2M raise 2 founders + 4 eng + 1 design + 1 GTM $100K ~$102K
Growth $100K MRR 2 founders + 8 eng + 2 design + 3 GTM $200K ~$202K

6.4 Break-Even Points

Monthly Expense Users Needed (at $14.99 Regular plan, 61% margin) Minutes Needed (at $0.25/min PAYG)
Cover infra ($310/mo) 34 subscribers 2,026 min
Cover infra + 1 hire ($10K/mo) 1,094 subscribers 67,320 min
Cover infra + seed team ($25K/mo) 2,733 subscribers 165,360 min
$100K MRR 6,671 subscribers 400,000 min

7. RISKS AND UNKNOWNS

Risk Impact Mitigation
Cartesia price increase TTS is 55% of COGS; 2x price = COGS goes from $0.097 to $0.152 Multi-provider support already built; can switch to Deepgram Aura ($0.030/1K)
GPT-5.2 pricing not confirmed Using GPT-5 published rates; actual GPT-5.2 may differ Test GPT-4o-mini quality; if acceptable, switch and save 25%
Turn density higher than estimated Dense Q&A = 8-10 turns/min; COGS could be $0.13-0.15/min TTS char cap already enforces output limit; main risk is LLM input tokens
ElevenLabs STT actual rate $0.22-0.40/hr range is wide; depends on plan tier Switch to Deepgram Nova-3 ($0.0077/min) for predictable pricing
Railway scaling costs Memory-dominated; adding replicas doubles memory cost Set resource limits in railway.toml; profile actual per-service usage
Volume discounts not modeled Twilio drops to ~$0.003/min at high volume; Cartesia may negotiate Model volume pricing when >10K min/mo

8. IMMEDIATE ACTIONS

Priority Action Impact
1 Test GPT-4o-mini quality for voice conversations Potential 25% COGS reduction ($0.097 -> $0.073)
2 Set BILLING_* env vars to actual costs Enable cost tracking before launch
3 Decide consumer pricing (subscription vs PAYG vs hybrid) Determines GTM strategy
4 Validate ElevenLabs STT actual per-minute cost Narrow the $0.22-0.40/hr range
5 Profile Cartesia usage to pick right plan (Startup vs Scale) Break-even at ~5,700 min/mo
6 Decide creator revenue share % 50% vs 70% vs 80% changes net margin 10-20x

Phony Voice Stack Decision Matrix

Date: 2026-03-06 Purpose: Full cost/quality/latency trade-off analysis for cofounder review


Assumptions (Validated from Codebase)

Parameter Value
Turns per minute 5.5
LLM input tokens per turn 3,000 (system prompt + RAG + history + message)
LLM output tokens per turn 80 (capped at 320 chars / 3 sentences)
TTS characters per minute 1,400 (5.5 turns x ~255 avg chars)
STT minutes per minute 1.0
Twilio (inbound + media streams) $0.0125/min (constant across all stacks)

1. COMPONENT COSTS (Per Minute)

1A. LLM Providers

# Model Quality (GPQA) Speed (tok/s) TTFB Input $/M Output $/M LLM Cost/Min
L1 GPT-5.2 92.4% 92 0.6s $1.50 $14.00 $0.031
L2 GPT-5 87.3% ~90 ~0.6s $1.25 $10.00 $0.025
L3 GPT-4o ~85% ~100 ~0.5s $2.50 $10.00 $0.046
L4 Claude Sonnet 4.6 ~83% ~80 ~0.8s $3.00 $15.00 $0.056
L5 Claude Haiku 4.5 ~75% ~120 ~0.5s $1.00 $5.00 $0.019
L6 GPT-5-mini ~78% ~150 ~0.4s $0.25 $2.00 $0.005
L7 GPT-4o-mini ~72% ~130 ~0.4s $0.15 $0.60 $0.003
L8 Groq Llama 4 Maverick ~80% 562 <0.2s $0.20 $0.60 $0.004
L9 Groq Llama 3.3 70B ~78% 394 <0.2s $0.59 $0.79 $0.010
L10 Groq Qwen3 32B ~73% 662 <0.2s $0.29 $0.59 $0.005

1B. TTS Providers

# Provider Model Quality TTFB Clone? Mulaw? TTS Cost/Min
T1 Cartesia (Startup) Sonic 3 Excellent 40ms Yes Yes $0.055
T2 Cartesia (Scale) Sonic 3 Excellent 40ms Yes Yes $0.052
T3 ElevenLabs Flash v2.5 Excellent 150-300ms Yes No $0.098
T4 Deepgram Aura-2 Good 100-200ms No No $0.042
T5 Deepgram Aura-1 Decent ~150ms No No $0.021
T6 Groq Orpheus Good 50-100ms No No $0.031
T7 Fish Speech S1 (Modal A100) S1 4B Excellent* 200-400ms Yes No $0.005
T8 F5-TTS (Modal A10) ConvNeXt Strong ~253ms Yes No $0.001
T9 Pocket TTS (Railway CPU) Kyutai 100M Mediocre 13-30s Yes No $0.002

*Fish Speech S1 is #1 on TTS-Arena2 benchmark

1C. STT Providers

# Provider Model Quality Latency STT Cost/Min
S1 ElevenLabs Scribe v2 Excellent Real-time streaming $0.005
S2 Deepgram Nova-3 Excellent Real-time streaming $0.008
S3 Groq Whisper Large v3 Turbo Strong 228x RT (batch) $0.001
S4 Self-hosted Whisper (Modal) Large v3 Strong ~2-5s (batch) $0.002

1D. Telephony (Constant)

Provider Components Cost/Min
Twilio Inbound ($0.0085) + Media Streams ($0.004) $0.013

2. FULL STACK COMBINATIONS

2A. Premium Stacks (Best Quality, Fastest)

Stack LLM TTS STT LLM/Min TTS/Min STT/Min Twilio/Min Total COGS/Min Est. End-to-End TTFB
P1 (Current) GPT-5.2 Cartesia Startup EL Scribe $0.031 $0.055 $0.005 $0.013 $0.104 ~0.7s
P2 GPT-5.2 Cartesia Scale EL Scribe $0.031 $0.052 $0.005 $0.013 $0.101 ~0.7s
P3 GPT-5 Cartesia Startup EL Scribe $0.025 $0.055 $0.005 $0.013 $0.098 ~0.7s
P4 Claude Sonnet 4.6 Cartesia Startup EL Scribe $0.056 $0.055 $0.005 $0.013 $0.129 ~0.9s

2B. Quality-Optimized Stacks (Strong Quality, Lower Cost)

Stack LLM TTS STT LLM/Min TTS/Min STT/Min Twilio/Min Total COGS/Min Est. End-to-End TTFB
Q1 GPT-5 Cartesia Startup Groq Whisper $0.025 $0.055 $0.001 $0.013 $0.094 ~0.7s
Q2 Claude Haiku 4.5 Cartesia Startup EL Scribe $0.019 $0.055 $0.005 $0.013 $0.092 ~0.6s
Q3 GPT-5 Deepgram Aura-2 EL Scribe $0.025 $0.042 $0.005 $0.013 $0.085 ~0.8s
Q4 GPT-5 Deepgram Aura-2 Deepgram Nova-3 $0.025 $0.042 $0.008 $0.013 $0.088 ~0.8s

2C. Speed-First Stacks (Lowest Latency)

Stack LLM TTS STT LLM/Min TTS/Min STT/Min Twilio/Min Total COGS/Min Est. End-to-End TTFB
S1 Groq Maverick Cartesia Startup EL Scribe $0.004 $0.055 $0.005 $0.013 $0.077 ~0.25s
S2 Groq Llama 70B Cartesia Startup EL Scribe $0.010 $0.055 $0.005 $0.013 $0.083 ~0.25s
S3 Groq Maverick Groq Orpheus Groq Whisper $0.004 $0.031 $0.001 $0.013 $0.049 ~0.15s
S4 Groq Maverick Cartesia Startup Groq Whisper $0.004 $0.055 $0.001 $0.013 $0.073 ~0.2s

2D. Cost-Optimized Stacks (Lowest COGS)

Stack LLM TTS STT LLM/Min TTS/Min STT/Min Twilio/Min Total COGS/Min Est. End-to-End TTFB
C1 GPT-5-mini Deepgram Aura-2 Groq Whisper $0.005 $0.042 $0.001 $0.013 $0.061 ~0.6s
C2 GPT-4o-mini Deepgram Aura-1 Groq Whisper $0.003 $0.021 $0.001 $0.013 $0.038 ~0.6s
C3 Groq Maverick F5-TTS (Modal) Groq Whisper $0.004 $0.001 $0.001 $0.013 $0.019 ~0.5s
C4 GPT-4o-mini Pocket TTS Groq Whisper $0.003 $0.002 $0.001 $0.013 $0.019 ~14s

2E. Future Self-Hosted Stacks (50K+ min/mo)

Stack LLM TTS STT LLM/Min TTS/Min STT/Min Twilio/Min Total COGS/Min Est. End-to-End TTFB
F1 GPT-5.2 Fish Speech S1 (Modal) Groq Whisper $0.031 $0.005 $0.001 $0.013 $0.050 ~0.9s
F2 GPT-5 F5-TTS (Modal) Groq Whisper $0.025 $0.001 $0.001 $0.013 $0.040 ~0.9s
F3 Groq Maverick Fish Speech S1 (Modal) Groq Whisper $0.004 $0.005 $0.001 $0.013 $0.023 ~0.5s

3. MASTER COMPARISON (All Stacks Ranked)

By Total COGS (Cheapest First)

Rank Stack Config COGS/Min TTFB Quality Trade-off
1 C3 Groq Maverick + F5-TTS + Groq Whisper $0.019 ~0.5s Lower LLM, strong TTS Open model quality ceiling, Modal cold starts
2 C4 GPT-4o-mini + Pocket TTS + Groq Whisper $0.019 ~14s Low LLM, bad TTS Unusable latency — free tier only
3 F3 Groq Maverick + Fish Speech + Groq Whisper $0.023 ~0.5s Strong TTS, lower LLM Needs Modal GPU fleet ops, no mulaw
4 C2 GPT-4o-mini + Aura-1 + Groq Whisper $0.038 ~0.6s Mid quality across board Noticeable conversation quality drop
5 F2 GPT-5 + F5-TTS + Groq Whisper $0.040 ~0.9s Near-frontier LLM, strong TTS Modal ops, audio conversion needed
6 S3 All-Groq (Maverick + Orpheus + Whisper) $0.049 ~0.15s Good, not great Lower LLM quality, no voice cloning
7 F1 GPT-5.2 + Fish Speech + Groq Whisper $0.050 ~0.9s Frontier LLM, great TTS Modal ops, higher TTS latency
8 C1 GPT-5-mini + Aura-2 + Groq Whisper $0.061 ~0.6s Mid LLM, good TTS Conversation quality noticeably worse
9 S4 Groq Maverick + Cartesia + Groq Whisper $0.073 ~0.2s Near-frontier, excellent TTS LLM quality gap vs GPT-5.2
10 S1 Groq Maverick + Cartesia + EL Scribe $0.077 ~0.25s Same as above, better STT Paying more for EL STT
11 S2 Groq Llama 70B + Cartesia + EL Scribe $0.083 ~0.25s GPT-4o class LLM Noticeable quality gap
12 Q3 GPT-5 + Aura-2 + EL Scribe $0.085 ~0.8s Near-frontier, good TTS Aura voice quality < Cartesia
13 Q4 GPT-5 + Aura-2 + Deepgram Nova-3 $0.088 ~0.8s Near-frontier, good TTS All-Deepgram audio stack
14 Q2 Haiku 4.5 + Cartesia + EL Scribe $0.092 ~0.6s Strong LLM, excellent TTS Haiku weaker on long conversations
15 Q1 GPT-5 + Cartesia + Groq Whisper $0.094 ~0.7s Near-frontier everything Cheapest STT saves $0.004
16 P3 GPT-5 + Cartesia + EL Scribe $0.098 ~0.7s Near-frontier everything Almost identical to current, $6 less/mo
17 P2 GPT-5.2 + Cartesia Scale + EL Scribe $0.101 ~0.7s Best everything Need 5,700+ min/mo for Scale plan
18 P1 (CURRENT) GPT-5.2 + Cartesia Startup + EL Scribe $0.104 ~0.7s Best everything Your stack today
19 P4 Sonnet 4.6 + Cartesia + EL Scribe $0.129 ~0.9s Frontier, great for nuance Most expensive, slower

By TTFB (Fastest First)

Rank Stack TTFB COGS/Min Quality Viable for Premium?
1 S3 (All-Groq) ~0.15s $0.049 Good No — LLM quality gap, no voice clone
2 S4 (Groq + Cartesia) ~0.2s $0.073 Near-frontier TTS, good LLM Maybe — test Maverick quality
3 S1 (Groq + Cartesia + EL) ~0.25s $0.077 Same Maybe
4 S2 (Groq 70B + Cartesia) ~0.25s $0.083 GPT-4o class No — quality gap
5 C3 (Groq + F5-TTS) ~0.5s $0.019 Strong TTS, good LLM No — TTS latency, ops burden
6 F3 (Groq + Fish Speech) ~0.5s $0.023 Great TTS, good LLM Aspirational at scale
7 Q2 (Haiku + Cartesia) ~0.6s $0.092 Strong Yes — near current quality
8 C1 (GPT-5-mini + Aura-2) ~0.6s $0.061 Mid No
9 P1 (GPT-5.2 + Cartesia) ~0.7s $0.104 Best Yes — current production
10 Q1 (GPT-5 + Cartesia) ~0.7s $0.094 Near-best Yes
11 Q3/Q4 (GPT-5 + Aura-2) ~0.8s $0.085-0.088 Strong Yes — if Aura quality acceptable
12 F1 (GPT-5.2 + Fish Speech) ~0.9s $0.050 Frontier LLM, great TTS Future scale play
13 P4 (Sonnet 4.6 + Cartesia) ~0.9s $0.129 Frontier Yes — but slower and pricier
14 C4 (Mini + Pocket TTS) ~14s $0.019 Low No — free tier demo only

By Quality (Best Conversation First)

Rank Stack Quality Rating COGS/Min TTFB Why This Quality Rank
1 P1/P2 (GPT-5.2 + Cartesia) A+ $0.101-0.104 0.7s #1 LLM + best low-latency TTS + voice cloning
2 P4 (Sonnet 4.6 + Cartesia) A+ $0.129 0.9s Frontier LLM (best at nuance/empathy) + Cartesia
3 P3/Q1 (GPT-5 + Cartesia) A $0.094-0.098 0.7s Near-frontier LLM, same TTS
4 F1 (GPT-5.2 + Fish Speech) A $0.050 0.9s Best LLM + #1 TTS-Arena model
5 Q2 (Haiku 4.5 + Cartesia) A- $0.092 0.6s Strong LLM, weaker on complex reasoning
6 Q3 (GPT-5 + Aura-2) B+ $0.085 0.8s Near-frontier LLM, TTS quality step down
7 S1/S4 (Groq Maverick + Cartesia) B+ $0.073-0.077 0.2-0.25s Great TTS, LLM misses subtlety sometimes
8 F2 (GPT-5 + F5-TTS) B+ $0.040 0.9s Good LLM, strong TTS, higher latency
9 S2 (Groq 70B + Cartesia) B $0.083 0.25s GPT-4o class LLM — adequate not great
10 C1 (GPT-5-mini + Aura-2) B- $0.061 0.6s Mid LLM + good TTS
11 S3 (All-Groq) B- $0.049 0.15s Fast but no voice cloning, weaker LLM
12 C3 (Groq + F5-TTS Modal) B- $0.019 0.5s Good TTS, weaker LLM, ops complexity
13 C2 (Mini + Aura-1) C+ $0.038 0.6s Noticeable quality drop everywhere
14 C4 (Mini + Pocket TTS) D $0.019 14s Free tier filler only

4. MONTHLY COST PROJECTIONS (At Volume)

Fixed overhead: ~$50/mo Railway + $49/mo Cartesia Startup + $8 Twilio numbers = ~$107/mo base

Monthly Minutes P1 (Current) Q1 (GPT-5 + Cartesia) S1 (Groq + Cartesia) F1 (GPT-5.2 + Fish@Scale) C2 (Budget)
$0.104/min $0.094/min $0.077/min $0.050/min $0.038/min
100 $117 + $10 = $127 $107 + $9 = $116 $107 + $8 = $115 $207* + $5 = $212 $107 + $4 = $111
500 $107 + $52 = $159 $107 + $47 = $154 $107 + $39 = $146 $207 + $25 = $232 $107 + $19 = $126
1,000 $107 + $104 = $211 $107 + $94 = $201 $107 + $77 = $184 $207 + $50 = $257 $107 + $38 = $145
5,000 $107 + $520 = $627 $107 + $470 = $577 $107 + $385 = $492 $207 + $250 = $457 $107 + $190 = $297
10,000 $357** + $1,040 = $1,397 $357 + $940 = $1,297 $357 + $770 = $1,127 $457 + $500 = $957 $357 + $380 = $737
50,000 $357 + $5,200 = $5,557 $357 + $4,700 = $5,057 $357 + $3,850 = $4,207 $457 + $2,500 = $2,957 $357 + $1,900 = $2,257

*F1 adds ~$100/mo Modal GPU keep-warm **At 10K+ min, upgrade Cartesia to Scale plan ($299 vs $49), Railway scales to ~$200

Break-Even Price Points (for 60% gross margin target)

Stack COGS/Min Price for 60% Margin Price for 70% Margin Price for 50% Margin
P1 (Current) $0.104 $0.26/min $0.35/min $0.21/min
Q1 (GPT-5 swap) $0.094 $0.24/min $0.31/min $0.19/min
S1 (Groq fast) $0.077 $0.19/min $0.26/min $0.15/min
F1 (Self-host TTS) $0.050 $0.13/min $0.17/min $0.10/min
C2 (Budget) $0.038 $0.10/min $0.13/min $0.08/min

5. RECOMMENDED STRATEGY

Now (0-2,000 min/mo): Stay on P1, Price Premium

Stack: GPT-5.2 + Cartesia Startup + ElevenLabs Scribe

Metric Value
COGS $0.104/min
TTFB ~0.7s
Quality A+ (best available)
Monthly fixed ~$107
Target price $0.25-0.35/min
Gross margin 60-70%

Why: You have the best stack in the market. Competitors on Llama 70B + Groq charge $0.07-0.12 for a B-tier product. You can charge $0.25-0.35 for an A+ tier product. Don't race to the bottom.

Quick win: Switch STT from ElevenLabs Scribe ($0.005/min) to Groq Whisper Turbo ($0.001/min) — saves $0.004/min, same quality for call transcription. COGS drops to $0.100/min.

Growth (2,000-10,000 min/mo): Optimize Provider Plans

Action Trigger Savings
Upgrade Cartesia Startup → Scale ~5,700 min/mo $0.003/min TTS savings
Enable OpenAI prompt caching Immediate ~$0.008/min on input tokens
Negotiate OpenAI volume pricing ~5,000 min/mo 20-40% off LLM
Negotiate Cartesia volume pricing ~10,000 min/mo Custom rate below $0.037/1K

Scale (50,000+ min/mo): Add Self-Hosted TTS Tier

Tier Stack COGS Price Margin
Ultra GPT-5.2 + Cartesia (40ms TTFB) $0.10 $0.35 71%
Premium GPT-5.2 + Fish Speech S1 (200ms TTFB) $0.05 $0.20 75%
Standard GPT-5 + F5-TTS (250ms TTFB) $0.04 $0.12 67%

6. WHAT NOT TO DO

Temptation Why Not
Switch to GPT-4o-mini / GPT-5-mini to save $0.025/min Core product IS the conversation quality. Saves $25/1K min but ruins the product.
Switch to Pocket TTS to eliminate TTS cost 13-30s TTFB. Product is dead on arrival. Only for free tier demo.
Go all-Groq to minimize COGS Maverick/Llama quality noticeably worse for nuanced creator conversations. Fine for simple chatbots, wrong for this product.
Self-host TTS now (pre-scale) Modal cold starts + ops burden + no mulaw = worse product for more effort. Wait for 50K min/mo.
Race to the bottom on pricing You have the best stack. Bland/Retell sell $0.07-0.12 with worse models. Own the premium tier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment