TL;DR: We built a two-agent architecture where a "Puppeteer" agent manages memory (timeline checkpoints + learned knowledge) and directs a stateless "Executor" agent. The executor is a Boltzmann brain—spawned fresh each run with no concept of history. But to the user, the system appears to have perfect memory.
Every AI agent has a fundamental constraint: limited context windows.
When you run an autonomous agent on a schedule—say, monitoring news every 6 hours—each run starts fresh. The agent doesn't remember what it did yesterday, what articles it already sent you, or that it tried a failing approach three times.
┌─────────────────────────────────────────────────────────────┐
│ THE MEMORY PROBLEM │
├─────────────────────────────────────────────────────────────┤
│ │
│ Run 1: "Found 3 AI articles, sent email" │
│ ↓ │
│ Run 2: "Found 3 AI articles, sent email" ← Same articles!│
│ ↓ │
│ Run 3: "Found 3 AI articles, sent email" ← User annoyed │
│ ↓ │
│ Run 4: "Found 3 AI articles, sent email" ← Unsubscribed │
│ │
└─────────────────────────────────────────────────────────────┘
Current solutions are expensive:
- Stuff complete history into context → token costs explode
- Summarize history → lose critical details
- RAG over history → adds latency and complexity
We wanted something different.
What if instead of making one agent do everything, we split the responsibilities?
| Concern | Who Handles It |
|---|---|
| Remember everything | Puppeteer Agent |
| Execute current task | Executor Agent |
| Analyze history | Puppeteer Agent |
| Use tools | Executor Agent |
| Avoid duplicate work | Puppeteer Agent |
| Follow instructions | Executor Agent |
| Learn reusable facts | Puppeteer Agent |
| Self-critique prompts | Puppeteer Agent |
The executor becomes a focused, efficient worker. The puppeteer becomes the strategic brain with access to complete history.
┌──────────────────────────────────────────────────────────────────┐
│ USER │
│ │ │
│ "Monitor HN for AI news, email me daily" │
│ │ │
└──────────────────────────┼───────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ PUPPETEER AGENT │
│ │
│ "I can see everything. I remember everything. │
│ But I don't execute—I direct." │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Run │ │Checkpoints│ │ Knowledge │ │ User │ │
│ │ History │ │ (timeline │ │ (learned │ │Instructions│ │
│ │ (50 runs) │ │ markers) │ │ facts) │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ │ │ │ │ │
│ └─────────────┴──────────────┴─────────────┘ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ ANALYZE & DECIDE │ │
│ │ + SELF-REFLECT │ │
│ └───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ GENERATE FOCUSED │ │
│ │ SYSTEM PROMPT │ │
│ └───────────────────────┘ │
│ │
└──────────────────────────┼───────────────────────────────────────┘
│
│ "Search HN for AI posts from
│ last 6 hours. Don't re-send
│ these 3 articles..."
▼
┌──────────────────────────────────────────────────────────────────┐
│ EXECUTOR AGENT │
│ │
│ "I only see my current instructions. │
│ I have no concept of past or future—just this task." │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ SYSTEM PROMPT │ │
│ │ (Generated by Puppeteer—executor doesn't know this) │ │
│ │ │ │
│ │ "You are an AI news monitoring agent. │ │
│ │ Search for posts from last 6 hours. │ │
│ │ Don't re-send: GPT-5 article, Anthropic funding..." │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ EXECUTE WITH TOOLS │ │
│ │ hacker_news search │ │
│ │ send_email │ │
│ └───────────────────────┘ │
│ │
└──────────────────────────┼───────────────────────────────────────┘
│
│ Results stored
▼
┌──────────────────────────────────────────────────────────────────┐
│ RUN HISTORY │
│ │
│ Run 47: { status: "completed", summary: "Found 2 new │
│ articles, sent digest", tools: ["hacker_news", │
│ "send_email"], messages: [...] } │
│ │
└──────────────────────────────────────────────────────────────────┘
This is crucial. The executor agent receives a system prompt and simply executes. It has no awareness of:
- That a puppeteer exists
- That its "memory" is actually injected context
- That its instructions were carefully crafted based on 50 previous runs
Why? Clean separation. The executor can be any standard agent framework. No special integration needed.
The puppeteer doesn't remember its own previous decisions. Each invocation is fresh analysis.
But it has two memory mechanisms:
Checkpoints — Timeline markers ("what happened when") with actionables:
┌─────────────────────────────────────────────────────────────┐
│ CHECKPOINTS │
│ (Puppeteer's notes to future self) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Run #10 [milestone] │
│ "Monitoring pattern established. HN search + email │
│ digest working reliably. Threshold: 2+ articles." │
│ │
│ Run #34 [issue_resolved] │
│ "Email delivery fixed. Root cause: wrong recipient. │
│ ACTIONABLE: If email fails again, check user │
│ instructions first—may be another address change." │
│ │
│ Run #45 [strategy_change] │
│ "User added Reddit. Now monitoring HN + r/ML. │
│ ACTIONABLE: Use different rate limits for Reddit API." │
│ │
└─────────────────────────────────────────────────────────────┘
Knowledge — Reusable facts ("what we learned") that apply across runs:
┌─────────────────────────────────────────────────────────────┐
│ KNOWLEDGE │
│ (Distilled, reusable learnings) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Topic: kite_api_auth │
│ "Kite sessions expire at 6 AM IST daily. Auth failures │
│ in morning runs are expected. User must re-authenticate │
│ via OAuth—don't retry repeatedly." │
│ │
│ Topic: competitor_c_scraping │
│ "Main website blocks scraping. Use LinkedIn company │
│ page instead. Pricing info is in 'About' section." │
│ │
└─────────────────────────────────────────────────────────────┘
The difference:
- Checkpoint: "Run #45 failed due to Kite auth" (timeline event)
- Knowledge: "Kite sessions expire at 6 AM IST" (reusable fact)
Running the puppeteer before every executor run is wasteful. Instead, the puppeteer outputs a reuse policy:
{
"executor_system_prompt": "You are an AI news agent...",
"reuse_policy": {
"valid_for_runs": 5,
"invalidate_on": ["executor_failed", "user_instructions_changed"],
"dynamic_fields": {
"{{hours_since_last_run}}": "Calculate from timestamps",
"{{recent_articles}}": "Extract from last 3 runs"
}
}
}How it works:
┌─────────────────────────────────────────────────────────────┐
│ EXECUTION FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ Cron triggers executor run │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Check reuse policy │ │
│ └─────────────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ │ │
│ ▼ ▼ │
│ [Valid] [Invalid] │
│ │ │ │
│ ▼ ▼ │
│ Fill in Invoke │
│ dynamic Puppeteer │
│ fields for fresh │
│ │ analysis │
│ │ │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ Run Executor │
│ │
└─────────────────────────────────────────────────────────────┘
Result: Puppeteer might run once every 5-10 executor runs, not every time.
Before finalizing output, the puppeteer critiques its own work:
┌─────────────────────────────────────────────────────────────┐
│ REFLECTION │
│ (Puppeteer's self-critique) │
├─────────────────────────────────────────────────────────────┤
│ │
│ "Being conservative with reuse policy (3 runs) since │
│ we just resolved a failure. Added tool_error:send_email │
│ to invalidation conditions as extra safety. Prompt │
│ explicitly mentions resolution to give executor │
│ confidence. Concern: may be including too much context │
│ but erring on the side of clarity after the failure." │
│ │
└─────────────────────────────────────────────────────────────┘
The reflection documents tradeoffs, concerns, and reasoning—helping debug prompt quality over time.
┌─────────────────────────────────┬─────────────────────────────────┐
│ PUPPETEER │ EXECUTOR │
├─────────────────────────────────┼─────────────────────────────────┤
│ │ │
│ Complete run history (50+) │ Nothing about history │
│ │ │
│ All checkpoints (timeline) │ Nothing about checkpoints │
│ │ │
│ All knowledge (learned facts) │ Facts baked into prompt │
│ │ │
│ User's original instructions │ Distilled current task │
│ │ │
│ Failed approaches │ "Don't do X" (without why) │
│ │ │
│ Tool usage patterns │ Available tools list │
│ │ │
│ Previous puppeteer outputs │ Nothing about puppeteer │
│ │ │
│ Strategic timeline │ Just this prompt, nothing else │
│ │ │
└─────────────────────────────────┴─────────────────────────────────┘
User instruction: "Monitor Hacker News for AI news and email me daily digests"
After 47 runs, puppeteer is invoked:
- Checkpoints: "Pattern established at run #10"
- Last 5 runs: All successful, 2-4 articles each
- No failures, stable operation
{
"analysis_summary": "Stable monitoring. 5 successful runs. No changes needed.",
"strategy": "Continue cyclical monitoring. Maintain current approach.",
"reflection": "High reuse (8 runs) appropriate for stable monitoring. Dynamic fields handle time-sensitive data. No concerns—pattern proven reliable.",
"executor_system_prompt": "You are an AI news monitoring agent.\n\nSearch for AI posts from the past {{hours}} hours.\n\nStories already sent (do not repeat):\n{{recent_articles}}\n\nIf 2+ new stories found, send email digest.\nIf <2, skip and wait for next run.",
"reuse_policy": {
"valid_for_runs": 8,
"invalidate_on": ["executor_failed"],
"dynamic_fields": {
"{{hours}}": "Hours since last run",
"{{recent_articles}}": "Titles from last 5 email sends"
}
},
"checkpoints_to_create": [],
"knowledge_to_create": []
}You are an AI news monitoring agent.
Search for AI posts from the past 6 hours.
Stories already sent (do not repeat):
- "GPT-5 Released with 1M Context Window"
- "Anthropic Raises $2B Series C"
- "Open Source LLM Beats GPT-4 on Benchmarks"
If 2+ new stories found, send email digest.
If <2, skip and wait for next run.
The executor has no idea this prompt was carefully constructed by analyzing 47 previous runs. It just executes.
| Benefit | How |
|---|---|
| No duplicate work | Puppeteer analyzes history, tells executor what's already done |
| Learns from failures | Puppeteer sees failed approaches, routes around them |
| Retains knowledge | Reusable facts (API quirks, workarounds) persist in knowledge base |
| Efficient token usage | Executor gets minimal context; puppeteer runs infrequently |
| Clean separation | Executor is standard agent; no special memory integration |
| Strategic continuity | Checkpoints preserve high-level timeline with actionables |
| Self-improving | Reflections help identify and fix prompt quality issues |
| Graceful degradation | If puppeteer fails, last cached prompt still works |
Good fit:
- Long-running autonomous agents (cron-scheduled)
- Agents that build on previous work (research, monitoring)
- Multi-phase projects spanning many runs
- Agents where duplicate work is costly or annoying
Not needed:
- Single-shot tasks
- Conversational agents (already have session context)
- Agents where every run is independent
Think of it like a film production:
| Role | Agent |
|---|---|
| Director | Puppeteer — sees the whole story, plans each scene |
| Actor | Executor — performs the current scene brilliantly |
| Script | System prompt — actor's instructions for this scene |
| Dailies | Run history — footage director reviews |
| Continuity notes | Checkpoints — timeline markers ("day 3 of shoot") |
| Production bible | Knowledge — reusable facts ("actor allergic to peanuts") |
The actor doesn't need to remember every previous scene. The director handles continuity. The audience (user) sees seamless performance.
We're building this into Alvix AI, our open-source autonomous agent platform. The architecture is generic—any agent with any tools can use this memory pattern.
Key implementation pieces:
- Puppeteer prompt engineering (the hard part)
- Checkpoint storage and retrieval (timeline markers)
- Knowledge base with semantic search (reusable facts)
- Dynamic field resolution for cached prompts
- Reuse policy evaluation logic
- Reflection logging for prompt quality debugging
The insight is simple: Don't make one agent do everything. Let a strategic agent manage memory—both episodic (checkpoints) and semantic (knowledge). Let an execution agent do work. Connect them with carefully crafted, self-critiqued prompts.
The executor is a Boltzmann brain—no past, no future, just now. The puppeteer sees everything. The user gets an agent that actually remembers—and learns.