Agent responses in the Dashboard UI arrive one word at a time with a long delay before the first word. This is a regression over the past few weeks, exclusively in local dev — production/staging are fine.
User sends message
→ API POST /chat
→ Inngest event: obvious/agent.execute
→ step.run('prepare-execution') — tracing, DB record, emit started event
→ step.run('check-credits') — 5+ async ops: workspace resolution, feature flags, credit balance
→ step.run('check-queue-N') — feature flag + queue check
→ step.run('agent-step-N') — context building + streamText() + iterate fullStream
// step-executor.ts:272-313
const streamResult = streamText({
model: provider(model), // Routes through Vercel AI Gateway
experimental_transform: ai.smoothStream({ delayInMs: 50 }),
// ...
})
for await (const message of streamResult.fullStream) {
await this.handleChunk(config.threadId, message) // AWAITS every chunk
}// step-executor.ts:637-648
if (chunk.type === 'text-delta') {
this.fullText += chunk.text ?? ''
return this.eventEmitter?.emitMessageUpdated({
id: this.currentMessageId,
fullText: this.fullText, // FULL accumulated text sent every token
toolCalls: this.toolCalls,
}, { ephemeral: true })
}handleChunk(text-delta)
→ AgentEventEmitter.emitMessageUpdated()
→ emitEvent() builds event object with makeId('evt', 16)
→ eventsService.publish(CHANNELS.USER(userId), event)
→ JSON.stringify(eventWithTimestamp) // serializes FULL accumulated text
→ await redisPublisher.publish(topic, message) // AWAITED per-token
→ SSE endpoint streams to dashboard
→ EventStreamService.onmessage → React state update
For ephemeral events (text-delta): no DB storage, just Redis PUBLISH. But every token still awaits the publish round-trip.
The AI Gateway was introduced ~Feb 8 (PR #3441). All AI calls now route through Vercel's infrastructure:
Local dev path:
Local machine → Internet → Vercel AI Gateway → Anthropic API → Gateway → Internet → Local machine
Production path:
Cloud server → Vercel AI Gateway → Anthropic API → Gateway → Cloud server
(all within same cloud region, ~sub-20ms overhead)
In local dev, every request and every SSE chunk traverses a full internet round-trip. This explains:
- Long delay before first word: TTFT includes gateway routing + internet latency
- Slow per-chunk delivery: each SSE event proxied through gateway
The direct provider fallback was removed on Feb 18 (PR #7075), but the gateway was already the default path since Feb 8.
The for await loop awaits handleChunk which awaits Redis PUBLISH for every text-delta. This creates backpressure:
- Each token waits for Redis to confirm the previous publish before reading the next
- Even with ~1ms local Redis latency, 2000 tokens = 2 seconds of pure blocking
- Combined with gateway latency, this compounds
Each text-delta sends fullText (entire accumulated response), not just the delta. As the response grows:
- Token 1: serialize ~10 bytes
- Token 1000: serialize ~5KB
- Token 2000: serialize ~10KB
Total serialization for a 2000-token response: ~10MB aggregate. Not the primary bottleneck but adds up.
experimental_transform: ai.smoothStream({ delayInMs: 50 }) adds artificial per-chunk buffering. Small but additive.
Each step.run() in the Inngest dev server involves HTTP round-trips for memoization. The 3 pre-streaming steps (prepare, credits, queue-check) add startup latency that's worse locally than in production Inngest.
| File | Purpose |
|---|---|
apps/api/src/agents/obvious-v2/execution/step-executor.ts |
Streaming loop, handleChunk, smoothStream config |
apps/api/src/agents/obvious-v2/state/events.ts |
AgentEventEmitter — builds and publishes events |
apps/api/src/redis/events.service.ts |
Redis publish implementation, ephemeral handling |
apps/api/src/inngest/obvious-agent-execution.ts |
Inngest function with pre-streaming step orchestration |
apps/api/src/agents/obvious-v2/state/provider.ts |
Provider initialization (gateway vs direct) |
apps/api/src/agents/lib/gateway.ts |
AI Gateway singleton configuration |
| Date | PR | Change | Impact |
|---|---|---|---|
| Feb 8 | #3441 | AI Gateway integration with BYOK | Primary cause — adds internet hop in local dev |
| Feb 8 | #6305 | Braintrust SDK upgrade to v2.2.0 | Adds tracing wrapping overhead |
| Feb 8 | #6530 | Drizzle ORM v1→v2 migration | Potential query perf changes |
| Feb 10 | #6684 | Move gateway fallback models into mode registry | Minor |
| Feb 18 | #7075 | Remove direct-provider fallback, gateway required | Removed escape hatch |
Re-introduce initializeDirectProvider (removed in PR #7075) as a local-dev-only path. When USE_LOCALSTACK or USE_DIRECT_PROVIDER env var is set, bypass the AI Gateway and call Anthropic directly.
The deleted code is preserved in commit ffd341e648.
Stop awaiting Redis publish for text-delta and text-start events. These are ephemeral and self-correcting (each carries full accumulated fullText). Add .catch() for error logging. Keep await for text-end, tool-input-start, tool-call.
Batch text-deltas to one Redis publish every ~32ms (~30fps). Reduces publishes from ~2000 per response to ~60-100 per second. Remove smoothStream({ delayInMs: 50 }) as it becomes redundant.
| Fix | Latency Reduction | Scope |
|---|---|---|
| Direct provider for local dev | Eliminates internet round-trip (~50-200ms TTFT, per-chunk latency) | Local dev only |
| Fire-and-forget deltas | -2-4 seconds on 2000-token response | All environments |
| Throttle to 32ms | Additional reduction from fewer publishes | All environments |