Nishkalkashyap/puppeteer-architecture-blog.md

## puppeteer-architecture-blog.md

      
    Raw
  

              puppeteer-architecture-blog.md
            
          
    Puppeteer Memory: Giving AI Agents Long-Term Memory Without the Context Window Tax

TL;DR: We built a two-agent architecture where a "Puppeteer" agent manages memory (timeline checkpoints + learned knowledge) and directs a stateless "Executor" agent. The executor is a Boltzmann brain—spawned fresh each run with no concept of history. But to the user, the system appears to have perfect memory.

The Problem: AI Agents Forget

Every AI agent has a fundamental constraint: limited context windows.
When you run an autonomous agent on a schedule—say, monitoring news every 6 hours—each run starts fresh. The agent doesn't remember what it did yesterday, what articles it already sent you, or that it tried a failing approach three times.
┌─────────────────────────────────────────────────────────────┐
│                    THE MEMORY PROBLEM                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Run 1: "Found 3 AI articles, sent email"                  │
│                      ↓                                      │
│   Run 2: "Found 3 AI articles, sent email"  ← Same articles!│
│                      ↓                                      │
│   Run 3: "Found 3 AI articles, sent email"  ← User annoyed  │
│                      ↓                                      │
│   Run 4: "Found 3 AI articles, sent email"  ← Unsubscribed  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Current solutions are expensive:

Stuff complete history into context → token costs explode
Summarize history → lose critical details
RAG over history → adds latency and complexity

We wanted something different.

The Insight: Separate Memory from Execution

What if instead of making one agent do everything, we split the responsibilities?


Concern
Who Handles It


Remember everything
Puppeteer Agent


Execute current task
Executor Agent


Analyze history
Puppeteer Agent


Use tools
Executor Agent


Avoid duplicate work
Puppeteer Agent


Follow instructions
Executor Agent


Learn reusable facts
Puppeteer Agent


Self-critique prompts
Puppeteer Agent


The executor becomes a focused, efficient worker. The puppeteer becomes the strategic brain with access to complete history.

The Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        USER                                      │
│                          │                                       │
│            "Monitor HN for AI news, email me daily"              │
│                          │                                       │
└──────────────────────────┼───────────────────────────────────────┘
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│                   PUPPETEER AGENT                                │
│                                                                  │
│  "I can see everything. I remember everything.                   │
│   But I don't execute—I direct."                                 │
│                                                                  │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐        │
│  │   Run     │ │Checkpoints│ │ Knowledge │ │   User    │        │
│  │  History  │ │ (timeline │ │ (learned  │ │Instructions│       │
│  │ (50 runs) │ │  markers) │ │  facts)   │ │           │        │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘        │
│        │             │              │             │              │
│        └─────────────┴──────────────┴─────────────┘              │
│                          ▼                                       │
│              ┌───────────────────────┐                           │
│              │   ANALYZE & DECIDE    │                           │
│              │   + SELF-REFLECT      │                           │
│              └───────────────────────┘                           │
│                          │                                       │
│                          ▼                                       │
│              ┌───────────────────────┐                           │
│              │  GENERATE FOCUSED     │                           │
│              │  SYSTEM PROMPT        │                           │
│              └───────────────────────┘                           │
│                                                                  │
└──────────────────────────┼───────────────────────────────────────┘
                           │
                           │ "Search HN for AI posts from
                           │  last 6 hours. Don't re-send
                           │  these 3 articles..."
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│                    EXECUTOR AGENT                                │
│                                                                  │
│  "I only see my current instructions.                            │
│   I have no concept of past or future—just this task."           │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                   SYSTEM PROMPT                          │    │
│  │  (Generated by Puppeteer—executor doesn't know this)     │    │
│  │                                                          │    │
│  │  "You are an AI news monitoring agent.                   │    │
│  │   Search for posts from last 6 hours.                    │    │
│  │   Don't re-send: GPT-5 article, Anthropic funding..."    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                          │                                       │
│                          ▼                                       │
│              ┌───────────────────────┐                           │
│              │   EXECUTE WITH TOOLS  │                           │
│              │   hacker_news search  │                           │
│              │   send_email          │                           │
│              └───────────────────────┘                           │
│                                                                  │
└──────────────────────────┼───────────────────────────────────────┘
                           │
                           │ Results stored
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│                      RUN HISTORY                                 │
│                                                                  │
│  Run 47: { status: "completed", summary: "Found 2 new           │
│            articles, sent digest", tools: ["hacker_news",       │
│            "send_email"], messages: [...] }                      │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘


Key Design Decisions

1. The Executor Doesn't Know About the Puppeteer

This is crucial. The executor agent receives a system prompt and simply executes. It has no awareness of:

That a puppeteer exists
That its "memory" is actually injected context
That its instructions were carefully crafted based on 50 previous runs

Why? Clean separation. The executor can be any standard agent framework. No special integration needed.
2. The Puppeteer is Stateless (But Has Checkpoints + Knowledge)

The puppeteer doesn't remember its own previous decisions. Each invocation is fresh analysis.
But it has two memory mechanisms:
Checkpoints — Timeline markers ("what happened when") with actionables:
┌─────────────────────────────────────────────────────────────┐
│                      CHECKPOINTS                            │
│              (Puppeteer's notes to future self)             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Run #10 [milestone]                                        │
│  "Monitoring pattern established. HN search + email         │
│   digest working reliably. Threshold: 2+ articles."         │
│                                                             │
│  Run #34 [issue_resolved]                                   │
│  "Email delivery fixed. Root cause: wrong recipient.        │
│   ACTIONABLE: If email fails again, check user              │
│   instructions first—may be another address change."        │
│                                                             │
│  Run #45 [strategy_change]                                  │
│  "User added Reddit. Now monitoring HN + r/ML.              │
│   ACTIONABLE: Use different rate limits for Reddit API."    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Knowledge — Reusable facts ("what we learned") that apply across runs:
┌─────────────────────────────────────────────────────────────┐
│                       KNOWLEDGE                             │
│              (Distilled, reusable learnings)                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Topic: kite_api_auth                                       │
│  "Kite sessions expire at 6 AM IST daily. Auth failures     │
│   in morning runs are expected. User must re-authenticate   │
│   via OAuth—don't retry repeatedly."                        │
│                                                             │
│  Topic: competitor_c_scraping                               │
│  "Main website blocks scraping. Use LinkedIn company        │
│   page instead. Pricing info is in 'About' section."        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The difference:

Checkpoint: "Run #45 failed due to Kite auth" (timeline event)
Knowledge: "Kite sessions expire at 6 AM IST" (reusable fact)

3. Reuse Policies Minimize Puppeteer Invocations

Running the puppeteer before every executor run is wasteful. Instead, the puppeteer outputs a reuse policy:
{
  "executor_system_prompt": "You are an AI news agent...",

  "reuse_policy": {
    "valid_for_runs": 5,
    "invalidate_on": ["executor_failed", "user_instructions_changed"],
    "dynamic_fields": {
      "{{hours_since_last_run}}": "Calculate from timestamps",
      "{{recent_articles}}": "Extract from last 3 runs"
    }
  }
}
How it works:
┌─────────────────────────────────────────────────────────────┐
│                    EXECUTION FLOW                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Cron triggers executor run                                 │
│            │                                                │
│            ▼                                                │
│  ┌─────────────────────┐                                    │
│  │ Check reuse policy  │                                    │
│  └─────────────────────┘                                    │
│            │                                                │
│     ┌──────┴──────┐                                         │
│     │             │                                         │
│     ▼             ▼                                         │
│  [Valid]      [Invalid]                                     │
│     │             │                                         │
│     ▼             ▼                                         │
│  Fill in      Invoke                                        │
│  dynamic      Puppeteer                                     │
│  fields       for fresh                                     │
│     │         analysis                                      │
│     │             │                                         │
│     └──────┬──────┘                                         │
│            │                                                │
│            ▼                                                │
│     Run Executor                                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Result: Puppeteer might run once every 5-10 executor runs, not every time.
4. Self-Reflection Improves Prompt Quality

Before finalizing output, the puppeteer critiques its own work:
┌─────────────────────────────────────────────────────────────┐
│                    REFLECTION                               │
│              (Puppeteer's self-critique)                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  "Being conservative with reuse policy (3 runs) since       │
│   we just resolved a failure. Added tool_error:send_email   │
│   to invalidation conditions as extra safety. Prompt        │
│   explicitly mentions resolution to give executor           │
│   confidence. Concern: may be including too much context    │
│   but erring on the side of clarity after the failure."     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The reflection documents tradeoffs, concerns, and reasoning—helping debug prompt quality over time.

What the Puppeteer Sees vs. What the Executor Sees

┌─────────────────────────────────┬─────────────────────────────────┐
│         PUPPETEER               │           EXECUTOR              │
├─────────────────────────────────┼─────────────────────────────────┤
│                                 │                                 │
│  Complete run history (50+)     │  Nothing about history          │
│                                 │                                 │
│  All checkpoints (timeline)     │  Nothing about checkpoints      │
│                                 │                                 │
│  All knowledge (learned facts)  │  Facts baked into prompt        │
│                                 │                                 │
│  User's original instructions   │  Distilled current task         │
│                                 │                                 │
│  Failed approaches              │  "Don't do X" (without why)     │
│                                 │                                 │
│  Tool usage patterns            │  Available tools list           │
│                                 │                                 │
│  Previous puppeteer outputs     │  Nothing about puppeteer        │
│                                 │                                 │
│  Strategic timeline             │  Just this prompt, nothing else │
│                                 │                                 │
└─────────────────────────────────┴─────────────────────────────────┘


A Concrete Example

User instruction: "Monitor Hacker News for AI news and email me daily digests"
After 47 runs, puppeteer is invoked:
What Puppeteer Analyzes:


Checkpoints: "Pattern established at run #10"
Last 5 runs: All successful, 2-4 articles each
No failures, stable operation

What Puppeteer Outputs:

{
  "analysis_summary": "Stable monitoring. 5 successful runs. No changes needed.",

  "strategy": "Continue cyclical monitoring. Maintain current approach.",

  "reflection": "High reuse (8 runs) appropriate for stable monitoring. Dynamic fields handle time-sensitive data. No concerns—pattern proven reliable.",

  "executor_system_prompt": "You are an AI news monitoring agent.\n\nSearch for AI posts from the past {{hours}} hours.\n\nStories already sent (do not repeat):\n{{recent_articles}}\n\nIf 2+ new stories found, send email digest.\nIf <2, skip and wait for next run.",

  "reuse_policy": {
    "valid_for_runs": 8,
    "invalidate_on": ["executor_failed"],
    "dynamic_fields": {
      "{{hours}}": "Hours since last run",
      "{{recent_articles}}": "Titles from last 5 email sends"
    }
  },

  "checkpoints_to_create": [],
  "knowledge_to_create": []
}
What Executor Sees:

You are an AI news monitoring agent.

Search for AI posts from the past 6 hours.

Stories already sent (do not repeat):
- "GPT-5 Released with 1M Context Window"
- "Anthropic Raises $2B Series C"
- "Open Source LLM Beats GPT-4 on Benchmarks"

If 2+ new stories found, send email digest.
If <2, skip and wait for next run.

The executor has no idea this prompt was carefully constructed by analyzing 47 previous runs. It just executes.

Why This Works


Benefit
How


No duplicate work
Puppeteer analyzes history, tells executor what's already done


Learns from failures
Puppeteer sees failed approaches, routes around them


Retains knowledge
Reusable facts (API quirks, workarounds) persist in knowledge base


Efficient token usage
Executor gets minimal context; puppeteer runs infrequently


Clean separation
Executor is standard agent; no special memory integration


Strategic continuity
Checkpoints preserve high-level timeline with actionables


Self-improving
Reflections help identify and fix prompt quality issues


Graceful degradation
If puppeteer fails, last cached prompt still works


When to Use This Pattern

Good fit:

Long-running autonomous agents (cron-scheduled)
Agents that build on previous work (research, monitoring)
Multi-phase projects spanning many runs
Agents where duplicate work is costly or annoying

Not needed:

Single-shot tasks
Conversational agents (already have session context)
Agents where every run is independent


The Mental Model

Think of it like a film production:


Role
Agent


Director
Puppeteer — sees the whole story, plans each scene


Actor
Executor — performs the current scene brilliantly


Script
System prompt — actor's instructions for this scene


Dailies
Run history — footage director reviews


Continuity notes
Checkpoints — timeline markers ("day 3 of shoot")


Production bible
Knowledge — reusable facts ("actor allergic to peanuts")


The actor doesn't need to remember every previous scene. The director handles continuity. The audience (user) sees seamless performance.

What's Next

We're building this into Alvix AI, our open-source autonomous agent platform. The architecture is generic—any agent with any tools can use this memory pattern.
Key implementation pieces:

Puppeteer prompt engineering (the hard part)
Checkpoint storage and retrieval (timeline markers)
Knowledge base with semantic search (reusable facts)
Dynamic field resolution for cached prompts
Reuse policy evaluation logic
Reflection logging for prompt quality debugging


The insight is simple: Don't make one agent do everything. Let a strategic agent manage memory—both episodic (checkpoints) and semantic (knowledge). Let an execution agent do work. Connect them with carefully crafted, self-critiqued prompts.
The executor is a Boltzmann brain—no past, no future, just now. The puppeteer sees everything. The user gets an agent that actually remembers—and learns.
Concern	Who Handles It
Remember everything	Puppeteer Agent
Execute current task	Executor Agent
Analyze history	Puppeteer Agent
Use tools	Executor Agent
Avoid duplicate work	Puppeteer Agent
Follow instructions	Executor Agent
Learn reusable facts	Puppeteer Agent
Self-critique prompts	Puppeteer Agent
Benefit	How
No duplicate work	Puppeteer analyzes history, tells executor what's already done
Learns from failures	Puppeteer sees failed approaches, routes around them
Retains knowledge	Reusable facts (API quirks, workarounds) persist in knowledge base
Efficient token usage	Executor gets minimal context; puppeteer runs infrequently
Clean separation	Executor is standard agent; no special memory integration
Strategic continuity	Checkpoints preserve high-level timeline with actionables
Self-improving	Reflections help identify and fix prompt quality issues
Graceful degradation	If puppeteer fails, last cached prompt still works
Role	Agent
Director	Puppeteer — sees the whole story, plans each scene
Actor	Executor — performs the current scene brilliantly
Script	System prompt — actor's instructions for this scene
Dailies	Run history — footage director reviews
Continuity notes	Checkpoints — timeline markers ("day 3 of shoot")
Production bible	Knowledge — reusable facts ("actor allergic to peanuts")