Skip to content

Instantly share code, notes, and snippets.

@jmanhype
Last active January 15, 2026 22:33
Show Gist options
  • Select an option

  • Save jmanhype/e30d57a9009ea4b18af3b4f13d69dcd4 to your computer and use it in GitHub Desktop.

Select an option

Save jmanhype/e30d57a9009ea4b18af3b4f13d69dcd4 to your computer and use it in GitHub Desktop.
Letta MAS Context Overflow Fix - Troubleshooting Guide for custom router endpoints
# Dockerfile for Letta with claude-max-router support
# This image patches Letta to accept custom model endpoint types
ARG LETTA_VERSION=latest
FROM letta/letta:${LETTA_VERSION}
LABEL maintainer="Letta MAS Admin"
LABEL description="Letta with claude-max-router endpoint support"
# Patch llm_config.py to add claude-max-router to allowed endpoint types
RUN sed -i 's/"xai",/"xai",\n "claude-max-router",/' /app/letta/schemas/llm_config.py || \
echo "Warning: Could not patch llm_config.py"
# Patch model.py to add claude-max-router to allowed endpoint types
RUN sed -i 's/"xai",/"xai",\n "claude-max-router",/' /app/letta/schemas/model.py || \
echo "Warning: Could not patch model.py"
# Verify patches were applied
RUN grep -q "claude-max-router" /app/letta/schemas/llm_config.py && \
echo "✓ llm_config.py patched successfully" || \
echo "✗ llm_config.py patch failed"
RUN grep -q "claude-max-router" /app/letta/schemas/model.py && \
echo "✓ model.py patched successfully" || \
echo "✗ model.py patch failed"
# Keep the original entrypoint
ENTRYPOINT ["./letta/server/startup.sh"]

Letta MAS Bulletproof Setup - Complete Guide

Overview

This guide covers making a Letta MAS (Multi-Agent System) video production pipeline bulletproof with:

  1. Persistent patches - Custom Docker image with fixes baked in
  2. Auto-healing - Maintenance script that monitors and fixes issues
  3. Proactive protection - Cron jobs to prevent problems before they occur

Problem: Context Overflow + Validation Errors

Symptoms

Context window exceeded (current: 176539, threshold: 32000)
model_endpoint_type Input should be 'openai', 'anthropic'... input_value='claude-max-router'

Root Cause

  1. Agents accumulate messages over time
  2. When messages exceed threshold, Letta triggers summarization
  3. Summarization requires LLM call to custom endpoint
  4. Custom endpoint type not in Pydantic Literal validation
  5. Agent gets stuck in a loop, stops working

Solution 1: Patched Docker Image

Dockerfile

ARG LETTA_VERSION=latest
FROM letta/letta:${LETTA_VERSION}

LABEL maintainer="Letta MAS Admin"
LABEL description="Letta with claude-max-router endpoint support"

# Patch llm_config.py
RUN sed -i 's/"xai",/"xai",\n    "claude-max-router",/' /app/letta/schemas/llm_config.py

# Patch model.py
RUN sed -i 's/"xai",/"xai",\n    "claude-max-router",/' /app/letta/schemas/model.py

# Verify patches
RUN grep -q "claude-max-router" /app/letta/schemas/llm_config.py && \
    echo "✓ llm_config.py patched" || echo "✗ PATCH FAILED"
RUN grep -q "claude-max-router" /app/letta/schemas/model.py && \
    echo "✓ model.py patched" || echo "✗ PATCH FAILED"

ENTRYPOINT ["./letta/server/startup.sh"]

Build & Deploy

# Build image
docker build -t letta-patched:bulletproof .

# Deploy (replace existing container)
docker stop letta-server
docker rm letta-server
docker run -d \
  --name letta-server \
  --restart unless-stopped \
  -p 8283:8283 \
  -e LETTA_PG_URI=postgresql+pg8000://letta:letta@host.docker.internal:5432/letta \
  --add-host=host.docker.internal:host-gateway \
  letta-patched:bulletproof

Solution 2: Maintenance Script

Auto-monitors and heals agents before they hit critical levels.

Key Features

  • Monitors message count for all agents
  • Auto-summarizes at 300+ messages
  • Force-resets at 500+ messages (if summarization fails)
  • Emergency reset at 600+ messages (no questions asked)
  • Health checks for Letta, Router, ComfyUI
  • Video generation staleness detection
  • Comprehensive logging

Configuration

MESSAGE_SUMMARIZE_THRESHOLD = 300  # Try to summarize
MESSAGE_RESET_THRESHOLD = 500      # Force reset if summarization failing
MESSAGE_CRITICAL_THRESHOLD = 600   # Emergency reset
VIDEO_STALE_MINUTES = 60           # Alert if no new video

Cron Setup (every 5 minutes)

*/5 * * * * /usr/bin/python3 /home/straughter/letta_maintenance.py >> /tmp/letta_maintenance.log 2>&1

Sample Output

[2026-01-15 16:31:18] [INFO] LETTA MAINTENANCE RUN STARTING
[2026-01-15 16:31:18] [INFO] Health: Letta=OK, Router=OK, ComfyUI=OK
[2026-01-15 16:31:18] [INFO] director has 37 messages
[2026-01-15 16:31:18] [INFO] writer has 171 messages
[2026-01-15 16:31:18] [WARN] cameraman at CRITICAL level (865 >= 600)
[2026-01-15 16:31:18] [SUCCESS] Message reset successful

Solution 3: Router Fixes

If using a custom router (e.g., claude-max-router):

Add Route Alias

Letta calls /chat/completions without /v1 prefix:

app.post('/v1/chat/completions', handleChatCompletionsRequest);
app.post('/chat/completions', handleChatCompletionsRequest);  // ADD THIS

Disable Bearer Passthrough

Force router to use its own OAuth instead of passing through Letta's token:

npx tsx src/router/server.ts --disable-bearer-passthrough

Complete Cron Schedule

# Maintenance - every 5 minutes (monitors & auto-heals)
*/5 * * * * /usr/bin/python3 /home/straughter/letta_maintenance.py >> /tmp/letta_maintenance.log 2>&1

# Scheduler - every 15 minutes (processes production queue)
*/15 * * * * /home/straughter/letta_scheduler.py --mode queue >> /tmp/letta_cron.log 2>&1

# Daily batch - 2 AM (generates 5 new videos)
0 2 * * * /home/straughter/letta_scheduler.py --mode daily --count 5 >> /tmp/letta_cron.log 2>&1

# Prompt refill - every 30 minutes (keeps queue populated)
*/30 * * * * /usr/bin/python3 /home/straughter/prompt_generator.py --refill --count 20 >> /tmp/prompt_refill.log 2>&1

Verification Commands

# Check agent message counts
curl -sL "http://localhost:8283/v1/agents/" | \
  jq -r '.[] | "\(.name): \(.message_ids | length) messages"'

# Check maintenance log
tail -50 /tmp/letta_maintenance.log

# Verify Docker patches
docker exec letta-server grep "claude-max-router" /app/letta/schemas/llm_config.py

# Check video count
ls ~/ComfyUI/output/video/*.mp4 | wc -l

# Check ComfyUI queue
curl -s http://localhost:8188/queue | jq '{running: .queue_running | length, pending: .queue_pending | length}'

Protection Level Summary

Protection Status
Persistent patches (Docker) ✅ Survives rebuilds
Auto-summarization ✅ Every 5 min check
Emergency reset ✅ At 600+ messages
Health monitoring ✅ Letta/Router/ComfyUI
Video staleness detection ✅ 60 min threshold
Logging ✅ /tmp/letta_maintenance.log
Docker restart policy ✅ unless-stopped

Protection Level: ~90%

The remaining 10% covers:

  • Unknown edge cases
  • Hardware failures
  • Network issues
  • OAuth token expiry (needs manual refresh)

Files Reference

File Purpose
~/letta-patched/Dockerfile Patched Letta image
~/letta_maintenance.py Auto-healing script
~/letta_scheduler.py Production queue processor
~/prompt_generator.py Auto-generates prompts
/tmp/letta_maintenance.log Maintenance logs
/tmp/letta_maintenance_state.json Persistent state

Generated 2026-01-15

Letta MAS Context Overflow Fix - Troubleshooting Guide

Problem

Letta agent (Director) stopped generating videos due to context window overflow and validation errors.

Symptoms

  • Video generation stopped
  • ComfyUI queue empty
  • Agent unresponsive to messages

Root Cause (Docker logs)

Context window exceeded (current: 176539, threshold: 32000)
Sliding window summarization failed with exception: Assistant message index 626 is at the end of the message buffer
1 validation error for LLMConfig model_endpoint_type
Input should be 'openai', 'anthropic'... [type=literal_error, input_value='claude-max-router', input_type=str]

The Problem Chain

┌─────────────────────────────────────────────────────────────────┐
│  1. Context Window Overflow (176k tokens > 32k threshold)       │
│     ↓                                                           │
│  2. Letta triggers automatic summarization                      │
│     ↓                                                           │
│  3. Summarization needs LLM call to custom endpoint             │
│     ↓                                                           │
│  4. Validation fails: "claude-max-router" not in allowed types  │
│     ↓                                                           │
│  5. Agent stuck in summarization loop, can't process messages   │
└─────────────────────────────────────────────────────────────────┘

Fix 1: Patch Letta to Accept Custom Endpoint Type

Letta uses Pydantic Literal types to validate model_endpoint_type. The custom router endpoint wasn't in the allowed list.

Files to Patch (inside Docker container)

1. /app/letta/schemas/llm_config.py (line ~29)

model_endpoint_type: Literal[
    "openai",
    "anthropic",
    "google_ai",
    "google_vertex",
    "cohere",
    "azure",
    "groq",
    "ollama",
    "webui",
    "webui-legacy",
    "lmstudio",
    "lmstudio-legacy",
    "llamacpp",
    "koboldcpp",
    "vllm",
    "bedrock",
    "deepseek",
    "xai",
    "claude-max-router",  # <-- ADD THIS
] = Field(..., description="The endpoint type for the model.")

2. /app/letta/schemas/model.py (line ~28) - Same Literal definition, add the same entry.

How to Apply (Docker)

# Enter container
docker exec -it letta-server bash

# Edit files
sed -i 's/"xai",/"xai",\n    "claude-max-router",/' /app/letta/schemas/llm_config.py
sed -i 's/"xai",/"xai",\n    "claude-max-router",/' /app/letta/schemas/model.py

# Exit and restart
exit
docker restart letta-server

Fix 2: Router Route Alias

Letta calls /chat/completions (without /v1 prefix) for summarization, but the router only had /v1/chat/completions.

Router Fix (TypeScript/Express)

// In server.ts - add route alias
app.post('/v1/chat/completions', handleChatCompletionsRequest);
app.post('/chat/completions', handleChatCompletionsRequest);  // ADD THIS

Fix 3: Disable Bearer Passthrough

If using a custom router with OAuth, ensure bearer passthrough is disabled so the router uses its own OAuth token instead of passing through Letta's internal token.

# Start router with passthrough disabled
npx tsx src/router/server.ts --disable-bearer-passthrough

Fix 4: Reset Agent Messages

If the agent has accumulated too many messages (even after fixes), use the reset API:

# Reset messages (preserves memory blocks)
curl -X PATCH "http://localhost:8283/v1/agents/{agent_id}/reset-messages" \
  -H "Content-Type: application/json"

Important: This clears conversation history but preserves:

  • Core memory blocks (persona, human, custom blocks)
  • Archival memory
  • Agent configuration

Verification Commands

# Check agent message count
curl -sL "http://localhost:8283/v1/agents/{agent_id}" | jq '.message_ids | length'

# Check Letta server logs
docker logs letta-server 2>&1 | tail -50

# Check router logs
tail -20 /tmp/router.log

# Test agent response
curl -sL -X POST "http://localhost:8283/v1/agents/{agent_id}/messages" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}]}'

Prevention

  1. Monitor message count - Set up alerts when message_ids > 500
  2. Regular summarization - Manually trigger /v1/agents/{id}/summarize periodically
  3. Use appropriate context threshold - Adjust context_window_limit in agent config if needed

Key Learnings

Issue Root Cause Fix
Validation error Custom endpoint not in Literal type Patch Letta source
404 on summarization Missing route alias Add /chat/completions route
401 auth error Bearer passthrough forwarding wrong token Disable passthrough
400 prompt too long Too many messages even for summarization Reset messages

Environment

  • Letta Server: Docker container
  • Custom Router: claude-max-router (Anthropic MAX Plan OAuth)
  • LLM: claude-sonnet-4-20250514
  • Video Generation: ComfyUI with LTX-2 model

Generated from troubleshooting session on 2026-01-15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment