Created
March 10, 2026 03:33
-
-
Save bdougie/41e60120359c48557446e85f5ccc0d08 to your computer and use it in GitHub Desktop.
How observational memory and Tapes work together in the Pokemon agent
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Observational Memory + Tapes in the Pokemon Agent | |
| ## The Problem | |
| Long agent runs hit context compaction. When Claude's context window fills up, older messages get summarized and the cache prefix breaks. The agent loses continuity: it forgets what it tried, what failed, and what worked. Each new session starts from scratch. | |
| ## How Tapes Solves Storage | |
| [Tapes](https://tapes.dev) proxies all LLM API calls and records every conversation turn in a content-addressable SQLite database at `.tapes/tapes.sqlite`. Each node in the database has: | |
| - A content hash (its primary key) | |
| - A `parent_hash` linking it to the previous turn | |
| - The full message content (role, text, tool calls, tool results) | |
| - Token counts (input, output, cache creation, cache read) | |
| - Timestamps, model name, agent name | |
| Sessions form chains. A root node (`parent_hash IS NULL`) starts a session. Each subsequent turn points back to its parent. To reconstruct a session, you walk the chain with a recursive CTE: | |
| ```sql | |
| WITH RECURSIVE chain(h) AS ( | |
| SELECT ? -- start from root hash | |
| UNION ALL | |
| SELECT n.hash FROM nodes n | |
| JOIN chain ON n.parent_hash = chain.h | |
| ) | |
| SELECT * FROM chain JOIN nodes n ON n.hash = chain.h | |
| ORDER BY n.created_at | |
| ``` | |
| This gives you the full conversation in order: every user message, every assistant response, every tool call and result. | |
| ## How Observational Memory Distills It | |
| The observer sits between Tapes (raw data) and the agent's next session (needs context). It runs after a session ends and extracts what matters. | |
| ### The pipeline | |
| ``` | |
| .tapes/tapes.sqlite | |
| │ | |
| ▼ | |
| TapeReader.read_session(root_hash) | |
| │ parses nodes into TapeEntry objects | |
| │ extracts tool uses, tool results, token counts | |
| ▼ | |
| Observer.observe_session(session) | |
| │ runs 4 heuristic extractors: | |
| │ 1. Session goal (first user message, skipping system noise) | |
| │ 2. Tool errors and exception tracebacks | |
| │ 3. Files created (Write tool invocations) | |
| │ 4. Token usage summary | |
| │ | |
| │ classifies each observation by priority: | |
| │ [important] bug, error, crash, security, etc. | |
| │ [possible] test, refactor, update, etc. | |
| │ [informational] everything else | |
| ▼ | |
| Observer.write_observations() | |
| │ appends to .tapes/memory/observations.md | |
| │ grouped by date, deduplicates headers | |
| ▼ | |
| .tapes/memory/observations.md ← next session reads this | |
| .tapes/memory/observer_state.json ← tracks processed sessions | |
| ``` | |
| ### What the output looks like | |
| ```markdown | |
| ## 2026-03-09 | |
| - [important] Session goal: fix the crash in battle strategy (session: a3f8c012) | |
| - [important] Tool error: ModuleNotFoundError: No module named 'numpy' (session: a3f8c012) | |
| - [possible] File created: scripts/pathfinding.py (session: a3f8c012) | |
| - [informational] Token usage: 45000 input, 12000 output, 38000 cache read (session: a3f8c012) | |
| ``` | |
| ### What it skips | |
| The observer filters noise that would pollute memory: | |
| - `<system-reminder>` tags that Tapes stores as user-role nodes | |
| - Casual mentions of "error" in assistant text (only matches `ValueError:`, `ModuleNotFoundError:`, etc. at line start) | |
| - Sessions that have already been processed (watermark in `observer_state.json`) | |
| ## The Data Flow Across Sessions | |
| ``` | |
| Session 1: Agent plays Pokemon Red | |
| │ | |
| ├── Tapes records every LLM turn to .tapes/tapes.sqlite | |
| │ | |
| ▼ | |
| After session: python3 scripts/observe_cli.py | |
| │ | |
| ├── Reads new sessions from SQLite | |
| ├── Extracts observations | |
| ├── Writes to .tapes/memory/observations.md | |
| ├── Updates watermark so these sessions aren't reprocessed | |
| │ | |
| ▼ | |
| Session 2: Agent loads observations.md at startup | |
| │ | |
| ├── Knows what Session 1 tried | |
| ├── Knows what errors occurred | |
| ├── Knows what files were created | |
| └── Picks up where Session 1 left off | |
| ``` | |
| ## Three Scripts, One Job | |
| | Script | Role | | |
| |--------|------| | |
| | `tape_reader.py` | Pure stdlib SQLite reader. Parses nodes into `TapeEntry` / `TapeSession` dataclasses. No dependencies beyond `sqlite3`. | | |
| | `observer.py` | Heuristic pattern matcher. No LLM calls. Extracts observations via keyword matching and structural patterns. | | |
| | `observe_cli.py` | CLI wrapper. Auto-detects `.tapes/tapes.sqlite`, supports `--dry-run`, `--reset`, `--session <hash>`. | | |
| ## Design Decisions | |
| **No LLM in the observer.** The extraction is pure heuristics: regex for tracebacks, keyword matching for priority, structural checks for tool errors. This keeps the observer fast, free, and deterministic. An LLM-based summarizer could be layered on top later. | |
| **Pure stdlib for the reader.** `tape_reader.py` uses only `sqlite3`, `json`, and `dataclasses`. No pandas, no ORM, no external dependencies. It runs anywhere Python runs. | |
| **Watermark-based idempotency.** `observer_state.json` tracks which session hashes have been processed. Running the observer twice produces no duplicates. `--reset` clears the watermark to reprocess everything. | |
| **Observations live alongside the data.** The output goes to `.tapes/memory/`, right next to `tapes.sqlite`. Both are gitignored. The observations are a derived view of the raw data, not a separate source of truth. | |
| ## Inspired By | |
| - [Mastra's observational memory](https://mastra.ai/blog/observational-memory): the concept of distilling agent sessions into prioritized observations | |
| - [Factorio Learning Environment](https://github.com/JackHopkins/factorio-learning-environment): incremental report distillation and error catalog patterns |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment