drewstone/ECONOMIC-MODEL-2026-03-06.md

## ECONOMIC-MODEL-2026-03-06.md

      
    Raw
  

              ECONOMIC-MODEL-2026-03-06.md
            
          
    Phony Platform Economic Model (Validated)

Date: 2026-03-06
Status: Validated against provider pricing pages and codebase analysis
Purpose: Cofounder presentation — COGS, overhead, pricing, team planning

1. VALIDATED COGS — Per Minute of Voice Conversation

1.1 Assumptions (Measured from Codebase)


Parameter
Value
Source


Turns per minute
5-6
Avg conversational pace with 320-char cap


Input tokens per turn
2,500-3,500
System prompt (~480) + RAG context (~1,500) + history (~1,000) + message (~100)


Output tokens per turn
60-100
Capped at 320 chars / 3 sentences by Twilio config


TTS characters per turn
~200-320
Hard cap: TWILIO_MAX_SPOKEN_CHARS_PER_TURN=320


TTS characters per minute
~1,200-1,600
5-6 turns x 240 avg chars


RAG queries per minute
~2
Not every turn triggers RAG; ~1 in 3 turns


Conversation history window
20 turns
Hardcoded in agent-chat.ts


RAG chunks injected
5
agent-chat.ts limit, 1,200 chars each


1.2 Provider Unit Costs (Validated March 2026)


Provider
Service
Unit
Published Rate
Source


Twilio
Inbound voice
per min
$0.0085
twilio.com/voice/pricing


Twilio
Media Streams (WebSocket)
per min
$0.0040
twilio.com/voice/pricing


Twilio
Phone number
per month
$1.15
twilio.com/voice/pricing


Twilio
Call recording
per min
$0.0025
twilio.com/voice/pricing


Cartesia
Sonic TTS (Startup plan)
per 1K chars
$0.039
$49/mo for 1.25M credits, 1 credit/char


Cartesia
Sonic TTS (Scale plan)
per 1K chars
$0.037
$299/mo for 8M credits


Cartesia
Sonic TTS (Pro plan)
per 1K chars
$0.050
$5/mo for 100K credits


OpenAI
GPT-4o
per 1M input tok
$2.50
openai.com/api/pricing


OpenAI
GPT-4o
per 1M output tok
$10.00
openai.com/api/pricing


OpenAI
GPT-4o-mini
per 1M input tok
$0.15
openai.com/api/pricing


OpenAI
GPT-4o-mini
per 1M output tok
$0.60
openai.com/api/pricing


OpenAI
GPT-5
per 1M input tok
$1.25
openai.com/api/pricing


OpenAI
GPT-5
per 1M output tok
$10.00
openai.com/api/pricing


OpenAI
GPT-5-mini
per 1M input tok
$0.25
openai.com/api/pricing


OpenAI
GPT-5-mini
per 1M output tok
$2.00
openai.com/api/pricing


OpenAI
text-embedding-3-small
per 1M tok
$0.02
openai.com/api/pricing


Anthropic
Claude 3.5 Haiku
per 1M input tok
$0.80
anthropic.com/pricing


Anthropic
Claude 3.5 Haiku
per 1M output tok
$4.00
anthropic.com/pricing


Anthropic
Claude Haiku 4.5
per 1M input tok
$1.00
anthropic.com/pricing


Anthropic
Claude Haiku 4.5
per 1M output tok
$5.00
anthropic.com/pricing


ElevenLabs
Scribe STT (Pro plan)
per hour
$0.22-0.40
elevenlabs.io/pricing


Deepgram
Nova-3 STT
per min
$0.0077
deepgram.com/pricing


Deepgram
Aura-2 TTS
per 1K chars
$0.030
deepgram.com/pricing


1.3 Per-Minute COGS Calculation (Current Production Stack)

Your current stack: Twilio + ElevenLabs STT + Cartesia TTS + GPT-5.2 LLM


Component
Calculation
Cost/Min


Twilio inbound
$0.0085 x 1 min
$0.0085


Twilio media streams
$0.0040 x 1 min
$0.0040


Cartesia TTS (Startup plan)
1,400 chars/min x $0.039/1K
$0.0546


ElevenLabs STT (Scribe)
~$0.30/hr / 60
$0.0050


GPT-5.2 input
5.5 turns x 3,000 tok x $1.25/1M
$0.0206


GPT-5.2 output
5.5 turns x 80 tok x $10.00/1M
$0.0044


RAG embeddings
2 queries x ~100 tok x $0.02/1M
$0.0000


Total COGS/min

$0.097


1.4 Per-Minute COGS by Stack Variant


Stack
Telephony
STT
TTS
LLM
RAG
Total/Min


Current production (GPT-5.2 + Cartesia Startup)
$0.0125
$0.0050
$0.0546
$0.0250
$0.0000
$0.097


Current w/ Cartesia Scale (GPT-5.2 + Cartesia Scale)
$0.0125
$0.0050
$0.0518
$0.0250
$0.0000
$0.094


Cost-optimized (GPT-4o-mini + Cartesia Startup)
$0.0125
$0.0050
$0.0546
$0.0013
$0.0000
$0.073


Budget (GPT-4o-mini + Pocket TTS)
$0.0125
$0.0050
$0.0020
$0.0013
$0.0000
$0.021


Deepgram all-in (GPT-4o-mini + Deepgram STT + Aura TTS)
$0.0125
$0.0077
$0.0420
$0.0013
$0.0000
$0.064


BYOK (developer brings all keys)
$0.0125
$0.0000
$0.0000
$0.0000
$0.0000
$0.013


Fully self-hosted (Whisper + Pocket TTS + open LLM)
$0.0125
$0.0020
$0.0020
$0.0030
$0.0000
$0.020


1.5 Per-Call COGS (at various durations)

Using the current production stack at $0.097/min:


Call Duration
COGS
At $0.15/min price
Margin
At $0.25/min price
Margin


3 min
$0.29
$0.45
36%
$0.75
61%


5 min
$0.49
$0.75
35%
$1.25
61%


10 min
$0.97
$1.50
35%
$2.50
61%


15 min
$1.46
$2.25
35%
$3.75
61%


30 min
$2.91
$4.50
35%
$7.50
61%


60 min
$5.82
$9.00
35%
$15.00
61%


Using the cost-optimized stack at $0.073/min:


Call Duration
COGS
At $0.15/min price
Margin
At $0.25/min price
Margin


3 min
$0.22
$0.45
51%
$0.75
71%


5 min
$0.37
$0.75
51%
$1.25
71%


10 min
$0.73
$1.50
51%
$2.50
71%


15 min
$1.10
$2.25
51%
$3.75
71%


30 min
$2.19
$4.50
51%
$7.50
71%


60 min
$4.38
$9.00
51%
$15.00
71%


2. FIXED OVERHEAD COSTS (Validated)

2.1 Railway Hosting (Actual Bill)

Plan: Pro ($20/mo base, includes $20 usage credits)
Actual monthly cost: ~$28/mo


Resource
Usage
Monthly Cost


CPU (0.056 vCPU avg across 11 services)
2,417 vCPU-min
$1.12


Memory (2.5 GB avg across 11 services)
109,180 GB-min
$25.22


Disk (7 GB total)
304,177 GB-min
$1.06


Network egress (3.2 GB)
3.2 GB
$0.16


Total usage

$27.56


Pro plan base

included


11 services running:


Service
Type
Memory Driver


@phony/api
Node.js Fastify
Moderate


@phony/web
Nginx static
Low


@persona-labs/voice-api
Node.js Hono
Moderate


@persona-labs/pocket-tts
Python PyTorch
High (model weights)


@persona-labs/ingest
Node.js Fastify
Moderate


@persona-labs/ingest-worker
Node.js background
Low


developer-portal
Static
Low


@phony/db
PostgreSQL
High


@persona-labs/db
PostgreSQL
High


@persona-labs/vector-db
PostgreSQL + pgvector
High


@persona-labs/storage-console
MinIO
Moderate


2.2 Provider Subscription Fees (Monthly Fixed)


Provider
Plan
Monthly Cost
What You Get


Cartesia
Startup
$49/mo
1.25M chars TTS (~893 agent-minutes)


Cartesia
Scale (if needed)
$299/mo
8M chars TTS (~5,714 agent-minutes)


ElevenLabs
Pro
$99/mo (if subscribed)
500K credits TTS + STT


ElevenLabs
Free (current)
$0
Broken for cloned voices


Twilio
Pay-as-you-go
$0 base
Per-use billing


Twilio
Phone numbers (7 creators)
$8.05/mo
7 x $1.15


OpenAI
Pay-as-you-go
$0 base
Per-use billing


Stripe
Standard
2.9% + $0.30/txn
When active


Railway
Pro
$20/mo base
Included in hosting


Total fixed provider costs

$77-$327/mo
Depends on Cartesia plan


2.3 Total Monthly Overhead (No Revenue)


Category
Low (Cartesia Startup)
High (Cartesia Scale)


Railway hosting
$28
$28


Cartesia subscription
$49
$299


Twilio phone numbers
$8
$8


Domain/DNS
~$2
~$2


Claude Code / dev tooling
~$200
~$200


Total burn
~$287/mo
~$537/mo


2.4 Cartesia Included Minutes Before Overage

Your Cartesia plan includes a block of TTS credits. Here's how many agent-minutes you get before paying overage:


Plan
Credits Included
Agent-Minutes Included*
Effective TTS Cost if Fully Used


Pro ($5/mo)
100K chars
~71 min
$0.070/min


Startup ($49/mo)
1.25M chars
~893 min
$0.055/min


Scale ($299/mo)
8M chars
~5,714 min
$0.052/min


*At 1,400 chars/min average
Implication: On the Startup plan, your first 893 minutes of voice conversation per month have TTS "pre-paid." After that, overage kicks in at $0.039/1K chars.

3. SENSITIVITY ANALYSIS — What Moves the Needle

3.1 COGS Sensitivity (Current Stack: $0.097/min)


If this changes...
From
To
Impact on COGS/min
New Total


Switch LLM to GPT-4o-mini
GPT-5.2 ($0.025/min)
GPT-4o-mini ($0.001/min)
-$0.024
$0.073


Switch LLM to GPT-5-mini
GPT-5.2 ($0.025/min)
GPT-5-mini ($0.003/min)
-$0.022
$0.075


Switch STT to Deepgram
EL Scribe ($0.005/min)
Deepgram Nova-3 ($0.008/min)
+$0.003
$0.100


Self-host STT (Whisper)
EL Scribe ($0.005/min)
Self-hosted ($0.002/min)
-$0.003
$0.094


Double turn density (10/min)
5.5 turns/min
10 turns/min
+$0.050
$0.147


Halve turn density (3/min)
5.5 turns/min
3 turns/min
-$0.025
$0.072


Drop RAG context (no KB)
5 chunks injected
0 chunks
-$0.008 (fewer LLM input tokens)
$0.089


Cut TTS cap to 160 chars
320 chars/turn
160 chars/turn
-$0.027
$0.070


Use Pocket TTS (free tier)
Cartesia ($0.055/min)
Pocket TTS ($0.002/min)
-$0.053
$0.044


The two biggest levers:

TTS provider — Cartesia is 55-60% of your COGS. Pocket TTS eliminates it but has 13-30s latency.
LLM model — GPT-5.2 vs GPT-4o-mini is a 25% COGS swing. Need to test quality impact.

3.2 What Happens as You Scale


Monthly Minutes
Twilio
Cartesia (Startup)
STT
LLM (GPT-5.2)
Total Variable
+ Fixed Overhead
Cost/Min (All-In)


100
$1.25
pre-paid*
$0.50
$2.50
$4.25
$287
$2.91


500
$6.25
pre-paid*
$2.50
$12.50
$21.25
$287
$0.62


893
$11.16
pre-paid*
$4.47
$22.33
$37.96
$287
$0.36


1,000
$12.50
$4.17 overage
$5.00
$25.00
$46.67
$287
$0.33


5,000
$62.50
$160.26 overage
$25.00
$125.00
$372.76
$287
$0.13


10,000
$125.00
$335.53 overage**
$50.00
$250.00
$760.53
$537**
$0.13


50,000
$625.00
$1,582.00 overage
$250.00
$1,250.00
$3,707.00
$537
$0.085


*First 893 min included in Startup plan ($49/mo)
**Switch to Scale plan ($299/mo) at ~5,700+ min/mo

4. ALL-IN COST PER MINUTE (Overhead Amortized)

This is the number that matters for pricing — COGS + fixed costs spread across actual usage.


Monthly Volume
Variable COGS/Min
Fixed Overhead/Min
All-In Cost/Min
Min Price for 50% Margin
Min Price for 70% Margin


100 min
$0.097
$2.87
$2.97
$5.94
$9.90


500 min
$0.097
$0.57
$0.67
$1.34
$2.23


1,000 min
$0.097
$0.29
$0.39
$0.78
$1.30


2,500 min
$0.097
$0.11
$0.21
$0.42
$0.70


5,000 min
$0.097
$0.06
$0.16
$0.32
$0.53


10,000 min
$0.097
$0.05
$0.15
$0.30
$0.50


50,000 min
$0.085
$0.01
$0.10
$0.20
$0.33


Key insight: You need ~2,500+ min/mo before you can sustainably price at $0.25/min with healthy margins. Below that, fixed costs eat you alive.

5. PRICING RECOMMENDATIONS (Revised)

5.1 Voice API (Developer Platform)


Tier
Price
Included Min
Overage
Your All-In COGS
Margin on Overage


Free
$0
5 min
Hard cap
$0.49/min (at 5 min vol)
N/A (acquisition)


Starter
$29/mo
50 min
$0.12/min
~$0.10/min (at ~300 min total)
17%


Pro
$99/mo
200 min
$0.10/min
~$0.10/min
~0% on overage, margin is in subscription


Scale
$299/mo
1,000 min
$0.08/min
~$0.097/min
Negative on overage alone


Problem: At current COGS ($0.097/min), you can't profitably sell overage below ~$0.15/min. The subscription fee IS the margin.
Revised recommendation:


Tier
Price
Included Min
Overage
Where Margin Comes From


Free
$0
5 min
Hard cap
Acquisition cost


Starter
$49/mo
100 min
$0.15/min
Sub fee ($49 covers 100 min COGS of $9.70) = 80% margin


Pro
$149/mo
500 min
$0.12/min
Sub fee ($149 covers 500 min COGS of $48.50) = 67% margin


Scale
$499/mo
2,500 min
$0.10/min
Sub fee ($499 covers 2,500 min COGS of $242.50) = 51% margin


5.2 Phony Consumer Product

At current COGS of $0.097/min, the old $2.50/min was actually fine on margin (96%) but terrible for adoption.


Plan
Price
Minutes
COGS
Gross Profit
Margin


Free trial
$0
3 min one-time
$0.29
-$0.29
CAC


Casual
$4.99/mo
15 min
$1.46
$3.53
71%


Regular
$14.99/mo
60 min
$5.82
$9.17
61%


Power
$29.99/mo
150 min
$14.55
$15.44
51%


Pay-as-you-go
$0.25/min
Unlimited
$0.097/min
$0.153/min
61%


5.3 Creator Revenue Share (If Applicable)

If creators get a cut, it comes out of platform gross profit:


Consumer Plan
Revenue
Creator Share (50%)
Platform After Creator
COGS
Platform Net
Net Margin


Casual ($4.99)
$4.99
$2.50
$2.50
$1.46
$1.04
21%


Regular ($14.99)
$14.99
$7.50
$7.50
$5.82
$1.68
11%


Power ($29.99)
$29.99
$15.00
$15.00
$14.55
$0.45
2%


At 50% creator share, margins are razor-thin. You'd need either:

Lower creator share (30-40%), or
Higher prices, or
Much lower COGS (switch to GPT-4o-mini = instant ~25% COGS reduction)

With GPT-4o-mini (COGS drops to $0.073/min):


Consumer Plan
Revenue
Creator Share (50%)
Platform After Creator
COGS
Platform Net
Net Margin


Casual ($4.99)
$4.99
$2.50
$2.50
$1.10
$1.40
28%


Regular ($14.99)
$14.99
$7.50
$7.50
$4.38
$3.12
21%


Power ($29.99)
$29.99
$15.00
$15.00
$10.95
$4.05
14%


6. DEVELOPMENT & TEAM COSTS

6.1 Current Burn Rate


Item
Monthly Cost


Railway hosting
$28


Cartesia Startup plan
$49


Twilio phone numbers
$8


OpenAI API (dev/test)
~$20


Claude Code / AI dev tools
~$200


Domain, DNS, misc
~$5


Total burn (no salaries)
~$310/mo


6.2 Replacement Cost of What's Been Built


Component
Person-Months
At $15K/mo contractor
At $25K/mo senior


Voice pipeline (Twilio + TTS + STT + LLM)
4
$60K
$100K


Agent runtime (15 tools, providers, conversation)
3
$45K
$75K


RAG pipeline (ingest, hybrid search, pgvector)
2
$30K
$50K


Phony API (auth, billing, creators, calls)
3
$45K
$75K


Phony Web (consumer + builder portal)
2
$30K
$50K


Voice API (developer platform, BYOK, tiers)
2
$30K
$50K


Developer portal
1
$15K
$25K


DevOps (Railway, Docker, CI/CD)
1
$15K
$25K


Voice cloning (7 creators, 3 providers)
1
$15K
$25K


Quality/eval (self-play, scorecards)
1.5
$22.5K
$37.5K


SDK packages (5 packages)
1.5
$22.5K
$37.5K


Total
22 mo
$330K
$550K


6.3 Team Scaling


Stage
Trigger
Team
Monthly Payroll
Total Monthly Burn


Now
Pre-revenue
1 founder + AI tooling
$0
~$310


Seed
$500K raise
2 founders + 1 eng
$25K
~$25.5K


Post-seed
$10K MRR
2 founders + 2 eng + 1 design
$45K
~$46K


Series A
$2M raise
2 founders + 4 eng + 1 design + 1 GTM
$100K
~$102K


Growth
$100K MRR
2 founders + 8 eng + 2 design + 3 GTM
$200K
~$202K


6.4 Break-Even Points


Monthly Expense
Users Needed (at $14.99 Regular plan, 61% margin)
Minutes Needed (at $0.25/min PAYG)


Cover infra ($310/mo)
34 subscribers
2,026 min


Cover infra + 1 hire ($10K/mo)
1,094 subscribers
67,320 min


Cover infra + seed team ($25K/mo)
2,733 subscribers
165,360 min


$100K MRR
6,671 subscribers
400,000 min


7. RISKS AND UNKNOWNS


Risk
Impact
Mitigation


Cartesia price increase
TTS is 55% of COGS; 2x price = COGS goes from $0.097 to $0.152
Multi-provider support already built; can switch to Deepgram Aura ($0.030/1K)


GPT-5.2 pricing not confirmed
Using GPT-5 published rates; actual GPT-5.2 may differ
Test GPT-4o-mini quality; if acceptable, switch and save 25%


Turn density higher than estimated
Dense Q&A = 8-10 turns/min; COGS could be $0.13-0.15/min
TTS char cap already enforces output limit; main risk is LLM input tokens


ElevenLabs STT actual rate
$0.22-0.40/hr range is wide; depends on plan tier
Switch to Deepgram Nova-3 ($0.0077/min) for predictable pricing


Railway scaling costs
Memory-dominated; adding replicas doubles memory cost
Set resource limits in railway.toml; profile actual per-service usage


Volume discounts not modeled
Twilio drops to ~$0.003/min at high volume; Cartesia may negotiate
Model volume pricing when >10K min/mo


8. IMMEDIATE ACTIONS


Priority
Action
Impact


1
Test GPT-4o-mini quality for voice conversations
Potential 25% COGS reduction ($0.097 -> $0.073)


2
Set BILLING_* env vars to actual costs
Enable cost tracking before launch


3
Decide consumer pricing (subscription vs PAYG vs hybrid)
Determines GTM strategy


4
Validate ElevenLabs STT actual per-minute cost
Narrow the $0.22-0.40/hr range


5
Profile Cartesia usage to pick right plan (Startup vs Scale)
Break-even at ~5,700 min/mo


6
Decide creator revenue share %
50% vs 70% vs 80% changes net margin 10-20x


## STACK-DECISION-MATRIX-2026-03-06.md

      
    Raw
  

              STACK-DECISION-MATRIX-2026-03-06.md
            
          
    Phony Voice Stack Decision Matrix

Date: 2026-03-06
Purpose: Full cost/quality/latency trade-off analysis for cofounder review

Assumptions (Validated from Codebase)


Parameter
Value


Turns per minute
5.5


LLM input tokens per turn
3,000 (system prompt + RAG + history + message)


LLM output tokens per turn
80 (capped at 320 chars / 3 sentences)


TTS characters per minute
1,400 (5.5 turns x ~255 avg chars)


STT minutes per minute
1.0


Twilio (inbound + media streams)
$0.0125/min (constant across all stacks)


1. COMPONENT COSTS (Per Minute)

1A. LLM Providers


#
Model
Quality (GPQA)
Speed (tok/s)
TTFB
Input $/M
Output $/M
LLM Cost/Min


L1
GPT-5.2
92.4%
92
0.6s
$1.50
$14.00
$0.031


L2
GPT-5
87.3%
~90
~0.6s
$1.25
$10.00
$0.025


L3
GPT-4o
~85%
~100
~0.5s
$2.50
$10.00
$0.046


L4
Claude Sonnet 4.6
~83%
~80
~0.8s
$3.00
$15.00
$0.056


L5
Claude Haiku 4.5
~75%
~120
~0.5s
$1.00
$5.00
$0.019


L6
GPT-5-mini
~78%
~150
~0.4s
$0.25
$2.00
$0.005


L7
GPT-4o-mini
~72%
~130
~0.4s
$0.15
$0.60
$0.003


L8
Groq Llama 4 Maverick
~80%
562
<0.2s
$0.20
$0.60
$0.004


L9
Groq Llama 3.3 70B
~78%
394
<0.2s
$0.59
$0.79
$0.010


L10
Groq Qwen3 32B
~73%
662
<0.2s
$0.29
$0.59
$0.005


1B. TTS Providers


#
Provider
Model
Quality
TTFB
Clone?
Mulaw?
TTS Cost/Min


T1
Cartesia (Startup)
Sonic 3
Excellent
40ms
Yes
Yes
$0.055


T2
Cartesia (Scale)
Sonic 3
Excellent
40ms
Yes
Yes
$0.052


T3
ElevenLabs
Flash v2.5
Excellent
150-300ms
Yes
No
$0.098


T4
Deepgram
Aura-2
Good
100-200ms
No
No
$0.042


T5
Deepgram
Aura-1
Decent
~150ms
No
No
$0.021


T6
Groq
Orpheus
Good
50-100ms
No
No
$0.031


T7
Fish Speech S1 (Modal A100)
S1 4B
Excellent*
200-400ms
Yes
No
$0.005


T8
F5-TTS (Modal A10)
ConvNeXt
Strong
~253ms
Yes
No
$0.001


T9
Pocket TTS (Railway CPU)
Kyutai 100M
Mediocre
13-30s
Yes
No
$0.002


*Fish Speech S1 is #1 on TTS-Arena2 benchmark
1C. STT Providers


#
Provider
Model
Quality
Latency
STT Cost/Min


S1
ElevenLabs
Scribe v2
Excellent
Real-time streaming
$0.005


S2
Deepgram
Nova-3
Excellent
Real-time streaming
$0.008


S3
Groq
Whisper Large v3 Turbo
Strong
228x RT (batch)
$0.001


S4
Self-hosted Whisper (Modal)
Large v3
Strong
~2-5s (batch)
$0.002


1D. Telephony (Constant)


Provider
Components
Cost/Min


Twilio
Inbound ($0.0085) + Media Streams ($0.004)
$0.013


2. FULL STACK COMBINATIONS

2A. Premium Stacks (Best Quality, Fastest)


Stack
LLM
TTS
STT
LLM/Min
TTS/Min
STT/Min
Twilio/Min
Total COGS/Min
Est. End-to-End TTFB


P1 (Current)
GPT-5.2
Cartesia Startup
EL Scribe
$0.031
$0.055
$0.005
$0.013
$0.104
~0.7s


P2
GPT-5.2
Cartesia Scale
EL Scribe
$0.031
$0.052
$0.005
$0.013
$0.101
~0.7s


P3
GPT-5
Cartesia Startup
EL Scribe
$0.025
$0.055
$0.005
$0.013
$0.098
~0.7s


P4
Claude Sonnet 4.6
Cartesia Startup
EL Scribe
$0.056
$0.055
$0.005
$0.013
$0.129
~0.9s


2B. Quality-Optimized Stacks (Strong Quality, Lower Cost)


Stack
LLM
TTS
STT
LLM/Min
TTS/Min
STT/Min
Twilio/Min
Total COGS/Min
Est. End-to-End TTFB


Q1
GPT-5
Cartesia Startup
Groq Whisper
$0.025
$0.055
$0.001
$0.013
$0.094
~0.7s


Q2
Claude Haiku 4.5
Cartesia Startup
EL Scribe
$0.019
$0.055
$0.005
$0.013
$0.092
~0.6s


Q3
GPT-5
Deepgram Aura-2
EL Scribe
$0.025
$0.042
$0.005
$0.013
$0.085
~0.8s


Q4
GPT-5
Deepgram Aura-2
Deepgram Nova-3
$0.025
$0.042
$0.008
$0.013
$0.088
~0.8s


2C. Speed-First Stacks (Lowest Latency)


Stack
LLM
TTS
STT
LLM/Min
TTS/Min
STT/Min
Twilio/Min
Total COGS/Min
Est. End-to-End TTFB


S1
Groq Maverick
Cartesia Startup
EL Scribe
$0.004
$0.055
$0.005
$0.013
$0.077
~0.25s


S2
Groq Llama 70B
Cartesia Startup
EL Scribe
$0.010
$0.055
$0.005
$0.013
$0.083
~0.25s


S3
Groq Maverick
Groq Orpheus
Groq Whisper
$0.004
$0.031
$0.001
$0.013
$0.049
~0.15s


S4
Groq Maverick
Cartesia Startup
Groq Whisper
$0.004
$0.055
$0.001
$0.013
$0.073
~0.2s


2D. Cost-Optimized Stacks (Lowest COGS)


Stack
LLM
TTS
STT
LLM/Min
TTS/Min
STT/Min
Twilio/Min
Total COGS/Min
Est. End-to-End TTFB


C1
GPT-5-mini
Deepgram Aura-2
Groq Whisper
$0.005
$0.042
$0.001
$0.013
$0.061
~0.6s


C2
GPT-4o-mini
Deepgram Aura-1
Groq Whisper
$0.003
$0.021
$0.001
$0.013
$0.038
~0.6s


C3
Groq Maverick
F5-TTS (Modal)
Groq Whisper
$0.004
$0.001
$0.001
$0.013
$0.019
~0.5s


C4
GPT-4o-mini
Pocket TTS
Groq Whisper
$0.003
$0.002
$0.001
$0.013
$0.019
~14s


2E. Future Self-Hosted Stacks (50K+ min/mo)


Stack
LLM
TTS
STT
LLM/Min
TTS/Min
STT/Min
Twilio/Min
Total COGS/Min
Est. End-to-End TTFB


F1
GPT-5.2
Fish Speech S1 (Modal)
Groq Whisper
$0.031
$0.005
$0.001
$0.013
$0.050
~0.9s


F2
GPT-5
F5-TTS (Modal)
Groq Whisper
$0.025
$0.001
$0.001
$0.013
$0.040
~0.9s


F3
Groq Maverick
Fish Speech S1 (Modal)
Groq Whisper
$0.004
$0.005
$0.001
$0.013
$0.023
~0.5s


3. MASTER COMPARISON (All Stacks Ranked)

By Total COGS (Cheapest First)


Rank
Stack
Config
COGS/Min
TTFB
Quality
Trade-off


1
C3
Groq Maverick + F5-TTS + Groq Whisper
$0.019
~0.5s
Lower LLM, strong TTS
Open model quality ceiling, Modal cold starts


2
C4
GPT-4o-mini + Pocket TTS + Groq Whisper
$0.019
~14s
Low LLM, bad TTS
Unusable latency — free tier only


3
F3
Groq Maverick + Fish Speech + Groq Whisper
$0.023
~0.5s
Strong TTS, lower LLM
Needs Modal GPU fleet ops, no mulaw


4
C2
GPT-4o-mini + Aura-1 + Groq Whisper
$0.038
~0.6s
Mid quality across board
Noticeable conversation quality drop


5
F2
GPT-5 + F5-TTS + Groq Whisper
$0.040
~0.9s
Near-frontier LLM, strong TTS
Modal ops, audio conversion needed


6
S3
All-Groq (Maverick + Orpheus + Whisper)
$0.049
~0.15s
Good, not great
Lower LLM quality, no voice cloning


7
F1
GPT-5.2 + Fish Speech + Groq Whisper
$0.050
~0.9s
Frontier LLM, great TTS
Modal ops, higher TTS latency


8
C1
GPT-5-mini + Aura-2 + Groq Whisper
$0.061
~0.6s
Mid LLM, good TTS
Conversation quality noticeably worse


9
S4
Groq Maverick + Cartesia + Groq Whisper
$0.073
~0.2s
Near-frontier, excellent TTS
LLM quality gap vs GPT-5.2


10
S1
Groq Maverick + Cartesia + EL Scribe
$0.077
~0.25s
Same as above, better STT
Paying more for EL STT


11
S2
Groq Llama 70B + Cartesia + EL Scribe
$0.083
~0.25s
GPT-4o class LLM
Noticeable quality gap


12
Q3
GPT-5 + Aura-2 + EL Scribe
$0.085
~0.8s
Near-frontier, good TTS
Aura voice quality < Cartesia


13
Q4
GPT-5 + Aura-2 + Deepgram Nova-3
$0.088
~0.8s
Near-frontier, good TTS
All-Deepgram audio stack


14
Q2
Haiku 4.5 + Cartesia + EL Scribe
$0.092
~0.6s
Strong LLM, excellent TTS
Haiku weaker on long conversations


15
Q1
GPT-5 + Cartesia + Groq Whisper
$0.094
~0.7s
Near-frontier everything
Cheapest STT saves $0.004


16
P3
GPT-5 + Cartesia + EL Scribe
$0.098
~0.7s
Near-frontier everything
Almost identical to current, $6 less/mo


17
P2
GPT-5.2 + Cartesia Scale + EL Scribe
$0.101
~0.7s
Best everything
Need 5,700+ min/mo for Scale plan


18
P1 (CURRENT)
GPT-5.2 + Cartesia Startup + EL Scribe
$0.104
~0.7s
Best everything
Your stack today


19
P4
Sonnet 4.6 + Cartesia + EL Scribe
$0.129
~0.9s
Frontier, great for nuance
Most expensive, slower


By TTFB (Fastest First)


Rank
Stack
TTFB
COGS/Min
Quality
Viable for Premium?


1
S3 (All-Groq)
~0.15s
$0.049
Good
No — LLM quality gap, no voice clone


2
S4 (Groq + Cartesia)
~0.2s
$0.073
Near-frontier TTS, good LLM
Maybe — test Maverick quality


3
S1 (Groq + Cartesia + EL)
~0.25s
$0.077
Same
Maybe


4
S2 (Groq 70B + Cartesia)
~0.25s
$0.083
GPT-4o class
No — quality gap


5
C3 (Groq + F5-TTS)
~0.5s
$0.019
Strong TTS, good LLM
No — TTS latency, ops burden


6
F3 (Groq + Fish Speech)
~0.5s
$0.023
Great TTS, good LLM
Aspirational at scale


7
Q2 (Haiku + Cartesia)
~0.6s
$0.092
Strong
Yes — near current quality


8
C1 (GPT-5-mini + Aura-2)
~0.6s
$0.061
Mid
No


9
P1 (GPT-5.2 + Cartesia)
~0.7s
$0.104
Best
Yes — current production


10
Q1 (GPT-5 + Cartesia)
~0.7s
$0.094
Near-best
Yes


11
Q3/Q4 (GPT-5 + Aura-2)
~0.8s
$0.085-0.088
Strong
Yes — if Aura quality acceptable


12
F1 (GPT-5.2 + Fish Speech)
~0.9s
$0.050
Frontier LLM, great TTS
Future scale play


13
P4 (Sonnet 4.6 + Cartesia)
~0.9s
$0.129
Frontier
Yes — but slower and pricier


14
C4 (Mini + Pocket TTS)
~14s
$0.019
Low
No — free tier demo only


By Quality (Best Conversation First)


Rank
Stack
Quality Rating
COGS/Min
TTFB
Why This Quality Rank


1
P1/P2 (GPT-5.2 + Cartesia)
A+
$0.101-0.104
0.7s
#1 LLM + best low-latency TTS + voice cloning


2
P4 (Sonnet 4.6 + Cartesia)
A+
$0.129
0.9s
Frontier LLM (best at nuance/empathy) + Cartesia


3
P3/Q1 (GPT-5 + Cartesia)
A
$0.094-0.098
0.7s
Near-frontier LLM, same TTS


4
F1 (GPT-5.2 + Fish Speech)
A
$0.050
0.9s
Best LLM + #1 TTS-Arena model


5
Q2 (Haiku 4.5 + Cartesia)
A-
$0.092
0.6s
Strong LLM, weaker on complex reasoning


6
Q3 (GPT-5 + Aura-2)
B+
$0.085
0.8s
Near-frontier LLM, TTS quality step down


7
S1/S4 (Groq Maverick + Cartesia)
B+
$0.073-0.077
0.2-0.25s
Great TTS, LLM misses subtlety sometimes


8
F2 (GPT-5 + F5-TTS)
B+
$0.040
0.9s
Good LLM, strong TTS, higher latency


9
S2 (Groq 70B + Cartesia)
B
$0.083
0.25s
GPT-4o class LLM — adequate not great


10
C1 (GPT-5-mini + Aura-2)
B-
$0.061
0.6s
Mid LLM + good TTS


11
S3 (All-Groq)
B-
$0.049
0.15s
Fast but no voice cloning, weaker LLM


12
C3 (Groq + F5-TTS Modal)
B-
$0.019
0.5s
Good TTS, weaker LLM, ops complexity


13
C2 (Mini + Aura-1)
C+
$0.038
0.6s
Noticeable quality drop everywhere


14
C4 (Mini + Pocket TTS)
D
$0.019
14s
Free tier filler only


4. MONTHLY COST PROJECTIONS (At Volume)

Fixed overhead: ~$50/mo Railway + $49/mo Cartesia Startup + $8 Twilio numbers = ~$107/mo base


Monthly Minutes
P1 (Current)
Q1 (GPT-5 + Cartesia)
S1 (Groq + Cartesia)
F1 (GPT-5.2 + Fish@Scale)
C2 (Budget)


$0.104/min
$0.094/min
$0.077/min
$0.050/min
$0.038/min


100
$117 + $10 = $127
$107 + $9 = $116
$107 + $8 = $115
$207* + $5 = $212
$107 + $4 = $111


500
$107 + $52 = $159
$107 + $47 = $154
$107 + $39 = $146
$207 + $25 = $232
$107 + $19 = $126


1,000
$107 + $104 = $211
$107 + $94 = $201
$107 + $77 = $184
$207 + $50 = $257
$107 + $38 = $145


5,000
$107 + $520 = $627
$107 + $470 = $577
$107 + $385 = $492
$207 + $250 = $457
$107 + $190 = $297


10,000
$357** + $1,040 = $1,397
$357 + $940 = $1,297
$357 + $770 = $1,127
$457 + $500 = $957
$357 + $380 = $737


50,000
$357 + $5,200 = $5,557
$357 + $4,700 = $5,057
$357 + $3,850 = $4,207
$457 + $2,500 = $2,957
$357 + $1,900 = $2,257


*F1 adds ~$100/mo Modal GPU keep-warm
**At 10K+ min, upgrade Cartesia to Scale plan ($299 vs $49), Railway scales to ~$200
Break-Even Price Points (for 60% gross margin target)


Stack
COGS/Min
Price for 60% Margin
Price for 70% Margin
Price for 50% Margin


P1 (Current)
$0.104
$0.26/min
$0.35/min
$0.21/min


Q1 (GPT-5 swap)
$0.094
$0.24/min
$0.31/min
$0.19/min


S1 (Groq fast)
$0.077
$0.19/min
$0.26/min
$0.15/min


F1 (Self-host TTS)
$0.050
$0.13/min
$0.17/min
$0.10/min


C2 (Budget)
$0.038
$0.10/min
$0.13/min
$0.08/min


5. RECOMMENDED STRATEGY

Now (0-2,000 min/mo): Stay on P1, Price Premium

Stack: GPT-5.2 + Cartesia Startup + ElevenLabs Scribe


Metric
Value


COGS
$0.104/min


TTFB
~0.7s


Quality
A+ (best available)


Monthly fixed
~$107


Target price
$0.25-0.35/min


Gross margin
60-70%


Why: You have the best stack in the market. Competitors on Llama 70B + Groq charge $0.07-0.12 for a B-tier product. You can charge $0.25-0.35 for an A+ tier product. Don't race to the bottom.
Quick win: Switch STT from ElevenLabs Scribe ($0.005/min) to Groq Whisper Turbo ($0.001/min) — saves $0.004/min, same quality for call transcription. COGS drops to $0.100/min.
Growth (2,000-10,000 min/mo): Optimize Provider Plans


Action
Trigger
Savings


Upgrade Cartesia Startup → Scale
~5,700 min/mo
$0.003/min TTS savings


Enable OpenAI prompt caching
Immediate
~$0.008/min on input tokens


Negotiate OpenAI volume pricing
~5,000 min/mo
20-40% off LLM


Negotiate Cartesia volume pricing
~10,000 min/mo
Custom rate below $0.037/1K


Scale (50,000+ min/mo): Add Self-Hosted TTS Tier


Tier
Stack
COGS
Price
Margin


Ultra
GPT-5.2 + Cartesia (40ms TTFB)
$0.10
$0.35
71%


Premium
GPT-5.2 + Fish Speech S1 (200ms TTFB)
$0.05
$0.20
75%


Standard
GPT-5 + F5-TTS (250ms TTFB)
$0.04
$0.12
67%


6. WHAT NOT TO DO


Temptation
Why Not


Switch to GPT-4o-mini / GPT-5-mini to save $0.025/min
Core product IS the conversation quality. Saves $25/1K min but ruins the product.


Switch to Pocket TTS to eliminate TTS cost
13-30s TTFB. Product is dead on arrival. Only for free tier demo.


Go all-Groq to minimize COGS
Maverick/Llama quality noticeably worse for nuanced creator conversations. Fine for simple chatbots, wrong for this product.


Self-host TTS now (pre-scale)
Modal cold starts + ops burden + no mulaw = worse product for more effort. Wait for 50K min/mo.


Race to the bottom on pricing
You have the best stack. Bland/Retell sell $0.07-0.12 with worse models. Own the premium tier.
Parameter	Value	Source
Turns per minute	5-6	Avg conversational pace with 320-char cap
Input tokens per turn	2,500-3,500	System prompt (~480) + RAG context (~1,500) + history (~1,000) + message (~100)
Output tokens per turn	60-100	Capped at 320 chars / 3 sentences by Twilio config
TTS characters per turn	~200-320	Hard cap: `TWILIO_MAX_SPOKEN_CHARS_PER_TURN=320`
TTS characters per minute	~1,200-1,600	5-6 turns x 240 avg chars
RAG queries per minute	~2	Not every turn triggers RAG; ~1 in 3 turns
Conversation history window	20 turns	Hardcoded in agent-chat.ts
RAG chunks injected	5	agent-chat.ts limit, 1,200 chars each
Provider	Service	Unit	Published Rate	Source
Twilio	Inbound voice	per min	$0.0085	twilio.com/voice/pricing
Twilio	Media Streams (WebSocket)	per min	$0.0040	twilio.com/voice/pricing
Twilio	Phone number	per month	$1.15	twilio.com/voice/pricing
Twilio	Call recording	per min	$0.0025	twilio.com/voice/pricing
Cartesia	Sonic TTS (Startup plan)	per 1K chars	$0.039	$49/mo for 1.25M credits, 1 credit/char
Cartesia	Sonic TTS (Scale plan)	per 1K chars	$0.037	$299/mo for 8M credits
Cartesia	Sonic TTS (Pro plan)	per 1K chars	$0.050	$5/mo for 100K credits
OpenAI	GPT-4o	per 1M input tok	$2.50	openai.com/api/pricing
OpenAI	GPT-4o	per 1M output tok	$10.00	openai.com/api/pricing
OpenAI	GPT-4o-mini	per 1M input tok	$0.15	openai.com/api/pricing
OpenAI	GPT-4o-mini	per 1M output tok	$0.60	openai.com/api/pricing
OpenAI	GPT-5	per 1M input tok	$1.25	openai.com/api/pricing
OpenAI	GPT-5	per 1M output tok	$10.00	openai.com/api/pricing
OpenAI	GPT-5-mini	per 1M input tok	$0.25	openai.com/api/pricing
OpenAI	GPT-5-mini	per 1M output tok	$2.00	openai.com/api/pricing
OpenAI	text-embedding-3-small	per 1M tok	$0.02	openai.com/api/pricing
Anthropic	Claude 3.5 Haiku	per 1M input tok	$0.80	anthropic.com/pricing
Anthropic	Claude 3.5 Haiku	per 1M output tok	$4.00	anthropic.com/pricing
Anthropic	Claude Haiku 4.5	per 1M input tok	$1.00	anthropic.com/pricing
Anthropic	Claude Haiku 4.5	per 1M output tok	$5.00	anthropic.com/pricing
ElevenLabs	Scribe STT (Pro plan)	per hour	$0.22-0.40	elevenlabs.io/pricing
Deepgram	Nova-3 STT	per min	$0.0077	deepgram.com/pricing
Deepgram	Aura-2 TTS	per 1K chars	$0.030	deepgram.com/pricing
Component	Calculation	Cost/Min
Twilio inbound	$0.0085 x 1 min	$0.0085
Twilio media streams	$0.0040 x 1 min	$0.0040
Cartesia TTS (Startup plan)	1,400 chars/min x $0.039/1K	$0.0546
ElevenLabs STT (Scribe)	~$0.30/hr / 60	$0.0050
GPT-5.2 input	5.5 turns x 3,000 tok x $1.25/1M	$0.0206
GPT-5.2 output	5.5 turns x 80 tok x $10.00/1M	$0.0044
RAG embeddings	2 queries x ~100 tok x $0.02/1M	$0.0000
Total COGS/min		$0.097
Stack	Telephony	STT	TTS	LLM	RAG	Total/Min
Current production (GPT-5.2 + Cartesia Startup)	$0.0125	$0.0050	$0.0546	$0.0250	$0.0000	$0.097
Current w/ Cartesia Scale (GPT-5.2 + Cartesia Scale)	$0.0125	$0.0050	$0.0518	$0.0250	$0.0000	$0.094
Cost-optimized (GPT-4o-mini + Cartesia Startup)	$0.0125	$0.0050	$0.0546	$0.0013	$0.0000	$0.073
Budget (GPT-4o-mini + Pocket TTS)	$0.0125	$0.0050	$0.0020	$0.0013	$0.0000	$0.021
Deepgram all-in (GPT-4o-mini + Deepgram STT + Aura TTS)	$0.0125	$0.0077	$0.0420	$0.0013	$0.0000	$0.064
BYOK (developer brings all keys)	$0.0125	$0.0000	$0.0000	$0.0000	$0.0000	$0.013
Fully self-hosted (Whisper + Pocket TTS + open LLM)	$0.0125	$0.0020	$0.0020	$0.0030	$0.0000	$0.020
Call Duration	COGS	At $0.15/min price	Margin	At $0.25/min price	Margin
3 min	$0.29	$0.45	36%	$0.75	61%
5 min	$0.49	$0.75	35%	$1.25	61%
10 min	$0.97	$1.50	35%	$2.50	61%
15 min	$1.46	$2.25	35%	$3.75	61%
30 min	$2.91	$4.50	35%	$7.50	61%
60 min	$5.82	$9.00	35%	$15.00	61%
Call Duration	COGS	At $0.15/min price	Margin	At $0.25/min price	Margin
3 min	$0.22	$0.45	51%	$0.75	71%
5 min	$0.37	$0.75	51%	$1.25	71%
10 min	$0.73	$1.50	51%	$2.50	71%
15 min	$1.10	$2.25	51%	$3.75	71%
30 min	$2.19	$4.50	51%	$7.50	71%
60 min	$4.38	$9.00	51%	$15.00	71%
Resource	Usage	Monthly Cost
CPU (0.056 vCPU avg across 11 services)	2,417 vCPU-min	$1.12
Memory (2.5 GB avg across 11 services)	109,180 GB-min	$25.22
Disk (7 GB total)	304,177 GB-min	$1.06
Network egress (3.2 GB)	3.2 GB	$0.16
Total usage		$27.56
Pro plan base		included
Service	Type	Memory Driver
@phony/api	Node.js Fastify	Moderate
@phony/web	Nginx static	Low
@persona-labs/voice-api	Node.js Hono	Moderate
@persona-labs/pocket-tts	Python PyTorch	High (model weights)
@persona-labs/ingest	Node.js Fastify	Moderate
@persona-labs/ingest-worker	Node.js background	Low
developer-portal	Static	Low
@phony/db	PostgreSQL	High
@persona-labs/db	PostgreSQL	High
@persona-labs/vector-db	PostgreSQL + pgvector	High
@persona-labs/storage-console	MinIO	Moderate
Provider	Plan	Monthly Cost	What You Get
Cartesia	Startup	$49/mo	1.25M chars TTS (~893 agent-minutes)
Cartesia	Scale (if needed)	$299/mo	8M chars TTS (~5,714 agent-minutes)
ElevenLabs	Pro	$99/mo (if subscribed)	500K credits TTS + STT
ElevenLabs	Free (current)	$0	Broken for cloned voices
Twilio	Pay-as-you-go	$0 base	Per-use billing
Twilio	Phone numbers (7 creators)	$8.05/mo	7 x $1.15
OpenAI	Pay-as-you-go	$0 base	Per-use billing
Stripe	Standard	2.9% + $0.30/txn	When active
Railway	Pro	$20/mo base	Included in hosting
Total fixed provider costs		$77-$327/mo	Depends on Cartesia plan
Category	Low (Cartesia Startup)	High (Cartesia Scale)
Railway hosting	$28	$28
Cartesia subscription	$49	$299
Twilio phone numbers	$8	$8
Domain/DNS	~$2	~$2
Claude Code / dev tooling	~$200	~$200
Total burn	~$287/mo	~$537/mo