Version 1.0 | January 2026
This paper presents the design, implementation, and optimization of a stateful multi-agent system (MAS) for autonomous video production. Built on the Letta framework with persistent memory capabilities, the system demonstrates how coordinated AI agents can learn user preferences, maintain cross-session continuity, and execute complex creative workflows without human intervention.
The architecture addresses a fundamental limitation in traditional LLM applications: the inability to learn and adapt across sessions. By implementing shared memory blocks, archival storage, and specialized "sleeptime" agents for background memory consolidation, the system achieves 8 distinct operational use cases including batch production, style learning, quality refinement loops, and A/B testing with preference capture.
A critical discovery during optimization revealed that background agents can develop behavioral patterns from accumulated message history that override explicit system prompt instructions. The solution—enabling message buffer autoclear—restored instruction-following behavior and represents a significant finding for practitioners deploying stateful agent systems.
Contemporary large language model deployments face an inherent limitation: each conversation exists in isolation. Users must repeatedly re-establish context, preferences, and project state. For creative production workflows requiring iterative refinement and personalization, this creates friction that limits practical utility.
Complex creative tasks benefit from role specialization. A video production pipeline requires distinct competencies: creative direction, prompt engineering, quality evaluation, and technical execution. Coordinating these roles while maintaining shared state introduces architectural complexity that monolithic agent designs cannot address.
This implementation explores three primary questions:
- Can stateful agents effectively learn and apply user preferences across sessions?
- How should memory be architected for multi-agent creative workflows?
- What failure modes emerge in persistent agent systems and how are they remediated?
AutoGen (Microsoft): Provides multi-agent conversation frameworks but lacks native persistent memory. Agents reset between sessions, requiring external state management.
CrewAI: Offers role-based agent orchestration with task delegation. Memory is session-scoped; cross-session learning requires custom implementation.
LangGraph: Enables stateful agent workflows via checkpointing. Focuses on workflow persistence rather than semantic memory evolution.
MemGPT/Letta: Implements hierarchical memory (core, archival, recall) with background consolidation via "sleeptime" agents. Native support for cross-session continuity and preference learning. This implementation builds on Letta's architecture.
Key Differentiator: This system extends Letta's memory model with domain-specific blocks (production_queue, quality_standards) and discovers the message buffer accumulation failure mode not documented in prior work.
The system employs a hierarchical multi-agent structure:
┌─────────────┐
│ DIRECTOR │
│ (Primary) │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌──────▼──────┐ │ ┌───────▼───────┐
│ WRITER │ │ │ CAMERAMAN │
│ (Prompts) │ │ │ (Quality) │
└─────────────┘ │ └───────────────┘
│
┌───────────▼───────────┐
│ SLEEPTIME AGENTS │
│ (Memory Consolidation)│
└───────────────────────┘
Director: Orchestrates workflow, manages user interaction, delegates to specialists, maintains strategic memory.
Writer: Specializes in prompt engineering, applies learned style preferences, searches archival memory for successful patterns.
Cameraman: Handles video generation via ComfyUI integration, evaluates quality, grades outputs, triggers refinement loops.
Sleeptime Agents: Background processors triggered every N interactions to consolidate memory, update session state, and extract learnings.
The system implements a three-tier memory hierarchy:
| Block | Purpose | Update Frequency |
|---|---|---|
session_state |
Cross-session continuity | Every sleeptime trigger |
user_style |
Learned preferences | On user feedback |
quality_standards |
Refinement thresholds | On quality failures |
production_queue |
Batch tracking | During production |
current_series |
Series continuity | During series work |
ab_testing |
Variation experiments | On A/B feedback |
Vector-indexed storage for:
- Successful prompt patterns with grades
- Failure patterns with root causes
- User preference history
- Quality assessment records
Per-agent message buffers providing recent interaction context. Critical discovery: unbounded accumulation degrades instruction-following.
┌─────────────────────────────────────────────────────────┐
│ LETTA SERVER │
│ (192.168.1.143:8283) │
└────────────────────────┬────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌────▼────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ ComfyUI │ │ Frame │ │ Scheduler │
│ :8188 │ │ Server │ │ (cron) │
│ LTX-2 │ │ :8189 │ │ │
└─────────┘ └───────────┘ └───────────┘
Trigger: "Create 10 videos of mythical creatures"
Workflow:
- Director parses request, populates
production_queue.PENDING_VIDEOS - For each video:
- Writer generates prompt using
user_stylepreferences - Cameraman submits to ComfyUI, monitors completion
- Quality evaluation triggers retry if below threshold
- Writer generates prompt using
- Progress tracked in
BATCH_COMPLETE / BATCH_TOTAL
Memory Updates: production_queue.COMPLETED_TODAY, archival entries for each video.
Trigger: "I prefer dark moody aesthetics with purple and blue tones"
Workflow:
- Director extracts preference dimensions (color, mood, theme)
- Updates
user_styleblock with structured preferences - Adds inverse to
DISLIKED_ELEMENTS - All subsequent prompts automatically incorporate preferences
Persistence: Preferences survive session boundaries via sleeptime consolidation.
Trigger: Automatic when video grade < MIN_ACCEPTABLE_GRADE
Workflow:
- Cameraman evaluates generated video
- If grade insufficient:
- Searches archival for similar failure patterns
- Updates
quality_standards.FAILURE_PATTERNS_TO_AVOID - Requests revised prompt from Writer
- Regenerates video
- Loop continues until quality threshold met or
MAX_RETRIESexceeded
Learning: Failure patterns persist, preventing repeated mistakes.
Trigger: "Create a 3-part Mythical Guardians series"
Workflow:
- Director initializes
current_serieswith theme and episode count - Defines
SERIES_STYLEandCONSISTENCY_RULES - Each episode references series parameters
- Progress tracked in
EPISODES_COMPLETED / EPISODES_PLANNED
Continuity: Series state persists across sessions, enabling long-form projects.
Trigger: "Generate 3 variations of this concept"
Workflow:
- Writer creates distinct variations
- Records in
ab_testing.VARIATIONSwith unique IDs - Presents options to user with clear labels
- On selection: captures in
USER_SELECTIONS, extracts pattern toWINNING_PATTERNS - Winning approach informs future generations
Feedback Loop: User preferences directly update production parameters.
Trigger: Session start
Workflow:
- Director checks
session_state.PENDING_WORK - If incomplete work exists, offers to resume
- Restores full context from memory blocks
- User can continue exactly where they left off
Implementation: Sleeptime agents update LAST_ACTIVITY and LAST_SESSION_SUMMARY every trigger cycle.
Trigger: Implicit on every generation request
Workflow:
- Before prompt creation, Writer reads
user_styleblock - Applies
COLOR_PREFERENCESto visual descriptions - Applies
MOOD_PREFERENCESto atmosphere/lighting - Applies
THEME_PREFERENCESto subject selection - Filters against
DISLIKED_ELEMENTSto exclude unwanted aesthetics
Automatic Application: No user action required after initial preference capture. All outputs reflect learned style.
Trigger: Automatic when initial output quality is insufficient
Workflow:
- Cameraman grades video output (A through F scale)
- If grade <
quality_standards.MIN_ACCEPTABLE_GRADE:- Cameraman identifies specific failure reasons
- Director requests revised prompt from Writer
- Writer searches archival for similar failures, adjusts approach
- Cameraman regenerates video
- Loop repeats until:
- Quality threshold met, OR
MAX_RETRIES(default: 3) exceeded
- Final result and iteration count logged to archival
Learning Persistence: Each iteration's success/failure patterns stored for future reference.
During optimization, sleeptime agents consistently failed to update session_state despite explicit system prompt instructions. Investigation revealed:
Symptom: session_state.LAST_ACTIVITY remained null across multiple sleeptime triggers.
Agent Reasoning (captured from message history):
"This conversation contains no substantive content... I don't need to make any changes to the memory blocks."
The agent was reasoning its way around mandatory instructions.
Sleeptime agents had accumulated 119 messages of history. This history contained repeated patterns of:
- Evaluating conversation content
- Deciding "nothing meaningful" occurred
- Calling
memory_finish_editswithout updates
The model was pattern-matching against its own historical behavior rather than following current system prompt instructions.
Enabling message_buffer_autoclear: true for all sleeptime agents:
curl -X PATCH "http://server:8283/v1/agents/{agent_id}/" \
-d '{"message_buffer_autoclear": true}'With cleared history, agents correctly followed system prompts:
"I should: 1. Update session_state with current timestamp and activity summary (MANDATORY)"
This finding has broad implications for stateful agent deployments:
- Background agents processing routine tasks accumulate behavioral patterns
- Accumulated history can override explicit instructions
- Autoclear mechanisms restore instruction-following for repetitive tasks
- System prompts alone are insufficient when competing with historical patterns
| Metric | Value |
|---|---|
| Total videos generated | 103+ |
| Average quality grade | B+ |
| A/A- grade rate | 34% |
| Batch completion rate | 100% |
| Cross-session resume rate | 100% |
| Style preference application | 100% |
| Block | Status | Evidence |
|---|---|---|
session_state |
Active | LAST_ACTIVITY: 2026-01-15T18:56:30Z |
user_style |
Populated | 6 preference dimensions tracked |
quality_standards |
Populated | 6 failure patterns, 5 refinement strategies |
production_queue |
Active | Batch tracking operational |
current_series |
Active | 3-episode series tracked |
ab_testing |
Active | User selections captured |
All 8 use cases demonstrated with live data:
- Batch Production: 3 videos queued, tracked, completed
- Style Learning: Preferences captured and applied to subsequent generations
- Quality Loop: Failure patterns stored, retry logic verified
- Content Series: Multi-episode continuity maintained
- A/B Testing: User selection captured, winning pattern extracted
- Personalization: Preferences automatically applied
- Cross-Session: Work resumed from previous session state
- Refinement: Quality grades tracked, iteration supported
- Atomic updates: Each block serves single purpose
- Structured format: Consistent field naming enables reliable parsing
- Default values: Initialize all fields to prevent null pointer failures
- Update timestamps: Track when each block was last modified
- Enable autoclear: Prevent behavioral drift from accumulated history
- Mandatory operations first: Structure prompts with required actions before conditional logic
- Verification protocols: Include checklists in system prompts
- Frequency tuning: Balance consolidation needs against processing overhead
- Explicit thresholds: Define
MIN_ACCEPTABLE_GRADEclearly - Bounded retries: Set
MAX_RETRIESto prevent infinite loops - Pattern extraction: Store both success and failure patterns
- Archival search: Query relevant patterns before generation
Current implementation uses fixed trigger frequency (every 5 interactions). Adaptive frequency based on conversation complexity could optimize resource utilization.
Current architecture assumes single user. Multi-tenant deployments require preference isolation and potentially hierarchical style inheritance.
Current quality grading relies on heuristics. Fine-tuned evaluation models could provide more consistent and nuanced quality assessment.
Current topology runs all agents on single Letta server. Distributed execution could enable horizontal scaling for production workloads.
This implementation demonstrates that stateful multi-agent systems can effectively address the limitations of traditional LLM deployments for creative production workflows. The combination of shared memory blocks, archival storage, and background consolidation agents enables capabilities previously requiring human oversight: preference learning, cross-session continuity, and quality-driven iteration.
The critical discovery regarding message buffer accumulation provides actionable guidance for practitioners: background agents performing routine operations require memory management to prevent behavioral drift. This finding extends beyond creative production to any stateful agent deployment.
The architecture presented here—Director/Specialist/Sleeptime topology with three-tier memory—offers a replicable pattern for complex, long-running AI workflows requiring coordination, learning, and persistence.
-
Letta Framework - Packer, C., et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, 2023. Documentation: https://docs.letta.com
-
LTX-Video Model - Lightricks Ltd. "LTX-Video: Realtime Video Generation." https://github.com/Lightricks/LTX-Video, 2024.
-
ComfyUI - comfyanonymous. "ComfyUI: A Powerful and Modular Stable Diffusion GUI." https://github.com/comfyanonymous/ComfyUI, 2023.
-
AutoGen - Wu, Q., et al. "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation." arXiv:2308.08155, 2023.
-
CrewAI - Moura, J. "CrewAI: Framework for orchestrating role-playing AI agents." https://github.com/joaomdmoura/crewAI, 2024.
-
LangGraph - LangChain Inc. "LangGraph: Build stateful, multi-actor applications with LLMs." https://github.com/langchain-ai/langgraph, 2024.
-
Model Context Protocol - Anthropic. "Model Context Protocol Specification." https://modelcontextprotocol.io, 2024.
GET /v1/agents/{id}/ # Agent details
PATCH /v1/agents/{id}/ # Update agent
POST /v1/agents/{id}/messages/ # Send message
GET /v1/agents/{id}/archival-memory/ # Query archival
GET /v1/blocks/{id}/ # Block value
PATCH /v1/blocks/{id}/ # Update block
{
"message_buffer_autoclear": true,
"multi_agent_group": {
"sleeptime_agent_frequency": 5,
"turns_counter": 0
}
}LAST_ACTIVITY: ISO timestamp
PENDING_WORK: array
IN_PROGRESS_PROJECT: string | null
QUEUED_VIDEOS: array
LAST_SESSION_SUMMARY: string | null
COLOR_PREFERENCES: string
MOOD_PREFERENCES: string
THEME_PREFERENCES: string
PACING_PREFERENCES: string
FAVORITE_SUCCESSES: array
DISLIKED_ELEMENTS: array
MIN_ACCEPTABLE_GRADE: string (A-F)
MAX_RETRIES: integer
FAILURE_PATTERNS_TO_AVOID: array
REFINEMENT_STRATEGIES: array
SUCCESSFUL_PATTERNS: array
PROVEN_PROMPT_TEMPLATES: array
QUEUE_STATUS: string (active|paused|complete)
PENDING_VIDEOS: array
IN_PROGRESS: string | null
COMPLETED_TODAY: array
FAILED_RETRIES: array
BATCH_ID: string
BATCH_TOTAL: integer
BATCH_COMPLETE: integer
ACTIVE_SERIES: string | null
SERIES_THEME: string
SERIES_STYLE: string
EPISODES_COMPLETED: integer
EPISODES_PLANNED: integer
SERIES_ELEMENTS: array
CONSISTENCY_RULES: array
ACTIVE_TEST: string | null
VARIATIONS: array
VARIATION_A_ID: string
VARIATION_B_ID: string
USER_SELECTIONS: array
WINNING_PATTERNS: array
Director: agent-22069f59-7a79-4890-bf4f-1f2a69696267
Writer: agent-e565b3e8-4a59-440a-89ab-6c279d61cfb0
Cameraman: agent-f939736a-46fc-4115-a584-0a8cf896212a
Director-sleeptime: agent-10605497-bc9d-454e-8745-672efd399de4
Writer-sleeptime: agent-fa49deb9-f3c7-413b-a73a-ca851fb5b0b8
Cameraman-sleeptime: agent-1ea4b81c-34f0-45ad-9bd6-80d98574ef25
session_state: block-4def4024-45a6-4b27-a7b7-f156de3bf58f
user_style: block-6af75f2c-6cec-458d-ad2d-c1a220476bd1
quality_standards: block-425744c7-f9dd-4ef9-9057-ccd80e0481fd
production_queue: block-3adb1fce-3a68-4dde-b95d-f1e5f5369364
current_series: block-f67e7bea-80f0-4afe-86cc-181bce6bf36f
ab_testing: block-674f9951-f22d-4451-862d-dd52c11161a2
Letta Server: http://192.168.1.143:8283
ComfyUI: http://192.168.1.143:8188
Frame Server: http://192.168.1.143:8189
Document generated from production system analysis, January 2026.