matthew-gerstman/streaming-perf-research.md

## streaming-perf-research.md

      
    Raw
  

              streaming-perf-research.md
            
          
    Agent Streaming Performance Research

Problem Statement

Agent responses in the Dashboard UI arrive one word at a time with a long delay before the first word. This is a regression over the past few weeks, exclusively in local dev — production/staging are fine.

Architecture Overview

Execution Flow

User sends message
  → API POST /chat
  → Inngest event: obvious/agent.execute
  → step.run('prepare-execution')   — tracing, DB record, emit started event
  → step.run('check-credits')       — 5+ async ops: workspace resolution, feature flags, credit balance
  → step.run('check-queue-N')       — feature flag + queue check
  → step.run('agent-step-N')        — context building + streamText() + iterate fullStream

Streaming Path (inside agent-step)

// step-executor.ts:272-313
const streamResult = streamText({
  model: provider(model),  // Routes through Vercel AI Gateway
  experimental_transform: ai.smoothStream({ delayInMs: 50 }),
  // ...
})

for await (const message of streamResult.fullStream) {
  await this.handleChunk(config.threadId, message)  // AWAITS every chunk
}
Per-Token Event Emission

// step-executor.ts:637-648
if (chunk.type === 'text-delta') {
  this.fullText += chunk.text ?? ''
  return this.eventEmitter?.emitMessageUpdated({
    id: this.currentMessageId,
    fullText: this.fullText,      // FULL accumulated text sent every token
    toolCalls: this.toolCalls,
  }, { ephemeral: true })
}
Event Emission → Redis → SSE → Dashboard

handleChunk(text-delta)
  → AgentEventEmitter.emitMessageUpdated()
  → emitEvent() builds event object with makeId('evt', 16)
  → eventsService.publish(CHANNELS.USER(userId), event)
  → JSON.stringify(eventWithTimestamp)    // serializes FULL accumulated text
  → await redisPublisher.publish(topic, message)  // AWAITED per-token
  → SSE endpoint streams to dashboard
  → EventStreamService.onmessage → React state update

For ephemeral events (text-delta): no DB storage, just Redis PUBLISH. But every token still awaits the publish round-trip.

Root Cause Analysis

Primary: AI Gateway Internet Round-Trip (LOCAL DEV ONLY)

The AI Gateway was introduced ~Feb 8 (PR #3441). All AI calls now route through Vercel's infrastructure:
Local dev path:
Local machine → Internet → Vercel AI Gateway → Anthropic API → Gateway → Internet → Local machine

Production path:
Cloud server → Vercel AI Gateway → Anthropic API → Gateway → Cloud server
(all within same cloud region, ~sub-20ms overhead)

In local dev, every request and every SSE chunk traverses a full internet round-trip. This explains:

Long delay before first word: TTFT includes gateway routing + internet latency
Slow per-chunk delivery: each SSE event proxied through gateway

The direct provider fallback was removed on Feb 18 (PR #7075), but the gateway was already the default path since Feb 8.
Secondary: Per-Token Redis PUBLISH Backpressure

The for await loop awaits handleChunk which awaits Redis PUBLISH for every text-delta. This creates backpressure:

Each token waits for Redis to confirm the previous publish before reading the next
Even with ~1ms local Redis latency, 2000 tokens = 2 seconds of pure blocking
Combined with gateway latency, this compounds

Tertiary: Growing Payload Size (O(n²) Serialization)

Each text-delta sends fullText (entire accumulated response), not just the delta. As the response grows:

Token 1: serialize ~10 bytes
Token 1000: serialize ~5KB
Token 2000: serialize ~10KB

Total serialization for a 2000-token response: ~10MB aggregate. Not the primary bottleneck but adds up.
Minor: smoothStream 50ms Delay

experimental_transform: ai.smoothStream({ delayInMs: 50 }) adds artificial per-chunk buffering. Small but additive.
Minor: Inngest Dev Server Step Overhead

Each step.run() in the Inngest dev server involves HTTP round-trips for memoization. The 3 pre-streaming steps (prepare, credits, queue-check) add startup latency that's worse locally than in production Inngest.

Key Files


File
Purpose


apps/api/src/agents/obvious-v2/execution/step-executor.ts
Streaming loop, handleChunk, smoothStream config


apps/api/src/agents/obvious-v2/state/events.ts
AgentEventEmitter — builds and publishes events


apps/api/src/redis/events.service.ts
Redis publish implementation, ephemeral handling


apps/api/src/inngest/obvious-agent-execution.ts
Inngest function with pre-streaming step orchestration


apps/api/src/agents/obvious-v2/state/provider.ts
Provider initialization (gateway vs direct)


apps/api/src/agents/lib/gateway.ts
AI Gateway singleton configuration


Timeline of Relevant Changes


Date
PR
Change
Impact


Feb 8
#3441
AI Gateway integration with BYOK
Primary cause — adds internet hop in local dev


Feb 8
#6305
Braintrust SDK upgrade to v2.2.0
Adds tracing wrapping overhead


Feb 8
#6530
Drizzle ORM v1→v2 migration
Potential query perf changes


Feb 10
#6684
Move gateway fallback models into mode registry
Minor


Feb 18
#7075
Remove direct-provider fallback, gateway required
Removed escape hatch


Proposed Fix

1. Restore Direct Provider Path for Local Dev (Highest Impact)

Re-introduce initializeDirectProvider (removed in PR #7075) as a local-dev-only path. When USE_LOCALSTACK or USE_DIRECT_PROVIDER env var is set, bypass the AI Gateway and call Anthropic directly.
The deleted code is preserved in commit ffd341e648.
2. Fire-and-Forget Ephemeral Text-Delta Publishes

Stop awaiting Redis publish for text-delta and text-start events. These are ephemeral and self-correcting (each carries full accumulated fullText). Add .catch() for error logging. Keep await for text-end, tool-input-start, tool-call.
3. Throttle Text-Delta Event Emission

Batch text-deltas to one Redis publish every ~32ms (~30fps). Reduces publishes from ~2000 per response to ~60-100 per second. Remove smoothStream({ delayInMs: 50 }) as it becomes redundant.

Expected Impact


Fix
Latency Reduction
Scope


Direct provider for local dev
Eliminates internet round-trip (~50-200ms TTFT, per-chunk latency)
Local dev only


Fire-and-forget deltas
-2-4 seconds on 2000-token response
All environments


Throttle to 32ms
Additional reduction from fewer publishes
All environments


## tingly-dreaming-flamingo.md

      
    Raw
  

              tingly-dreaming-flamingo.md
            
          
    Fix Slow Agent Streaming Performance (Local Dev)

Context

Agent responses in the Dashboard UI are arriving one word at a time with a long delay before the first word. This is a regression over the past few weeks, exclusively in local dev. Production/staging are fine.
Primary root cause: The AI Gateway introduction (~Feb 8, PR #3441) routes all AI calls through Vercel's infrastructure. In production this adds negligible latency (same cloud). In local dev, every request and every SSE chunk traverses an internet round-trip (local machine → Vercel Gateway → Anthropic → Gateway → local machine), significantly increasing TTFT and per-chunk delivery time.
Secondary cause: Every text-delta token awaits a Redis PUBLISH before processing the next token, creating backpressure. This compounds the gateway latency.
Commits


perf(agent): restore direct provider path for local dev
perf(agent): fire-and-forget ephemeral text-delta publishes
perf(agent): throttle text-delta emission to 32ms intervals

Changes

1. Restore Direct Provider Path for Local Dev (Highest Impact)

File: apps/api/src/agents/obvious-v2/state/provider.ts
Re-introduce initializeDirectProvider (removed in PR #7075) as a local-dev-only path. When USE_LOCALSTACK or a new USE_DIRECT_PROVIDER env var is set, bypass the AI Gateway and call Anthropic/OpenAI directly using @ai-sdk/anthropic etc.
The deleted code is in commit ffd341e648 — restore the initializeDirectProvider function and the conditional in initializeProvider:
// Use gateway in production, direct provider locally
if (process.env.USE_LOCALSTACK || process.env.USE_DIRECT_PROVIDER) {
  return initializeDirectProvider(providerName, modelName, betas)
}
return initializeWithGateway(providerName, modelName, betas)
File: apps/api/src/agents/obvious-v2/state/provider.ts (types)
Restore the union ProviderFactory type to include direct SDK providers.
2. Fire-and-Forget Ephemeral Text-Delta Publishes

File: apps/api/src/agents/obvious-v2/execution/step-executor.ts
In handleChunk (~line 637-648), stop awaiting the Redis publish for text-delta and text-start events. Add .catch() for error logging. These are ephemeral and self-correcting (each carries full accumulated fullText).
Keep await for text-end, tool-input-start, tool-call — these trigger DB persistence.
3. Throttle Text-Delta Event Emission

File: apps/api/src/agents/obvious-v2/execution/step-executor.ts
Add a throttled emit that batches text-deltas to one Redis publish every ~32ms (~30fps). Reduces publishes from ~2000 per response to ~60-100 per second of streaming.

Add emitThrottledDelta() / flushDelta() methods
text-end handler flushes pending delta before persisting final state
Remove smoothStream({ delayInMs: 50 }) — redundant with throttled emit

Files to Modify


apps/api/src/agents/obvious-v2/state/provider.ts — restore direct provider path
apps/api/src/agents/obvious-v2/execution/step-executor.ts — fire-and-forget deltas, throttling, remove smoothStream

Verification


Run bun obvious up locally
Open Dashboard, start a fresh conversation
Confirm streaming is fast and smooth (not word-by-word)
Verify text-end still persists final message correctly (check DB)
bun obvious test --changed
Compare TTFT with and without USE_DIRECT_PROVIDER to confirm gateway is the bottleneck
File	Purpose
`apps/api/src/agents/obvious-v2/execution/step-executor.ts`	Streaming loop, handleChunk, smoothStream config
`apps/api/src/agents/obvious-v2/state/events.ts`	AgentEventEmitter — builds and publishes events
`apps/api/src/redis/events.service.ts`	Redis publish implementation, ephemeral handling
`apps/api/src/inngest/obvious-agent-execution.ts`	Inngest function with pre-streaming step orchestration
`apps/api/src/agents/obvious-v2/state/provider.ts`	Provider initialization (gateway vs direct)
`apps/api/src/agents/lib/gateway.ts`	AI Gateway singleton configuration
Date	PR	Change	Impact
Feb 8	#3441	AI Gateway integration with BYOK	Primary cause — adds internet hop in local dev
Feb 8	#6305	Braintrust SDK upgrade to v2.2.0	Adds tracing wrapping overhead
Feb 8	#6530	Drizzle ORM v1→v2 migration	Potential query perf changes
Feb 10	#6684	Move gateway fallback models into mode registry	Minor
Feb 18	#7075	Remove direct-provider fallback, gateway required	Removed escape hatch
Fix	Latency Reduction	Scope
Direct provider for local dev	Eliminates internet round-trip (~50-200ms TTFT, per-chunk latency)	Local dev only
Fire-and-forget deltas	-2-4 seconds on 2000-token response	All environments
Throttle to 32ms	Additional reduction from fewer publishes	All environments