A lightweight handoff protocol for multi-agent workflows.
Authors: Hamza Mounir & Sanawar Syed Azor Ali | pixldev.be
In a multi-stage workflow where different AI agents handle each step (planning, implementation, testing, review), there's a fundamental tension:
- Agents need context from previous stages to do their job well
- But passing full artifacts (code, plans, docs) between every stage blows up the token budget
- And even when you do pass artifacts, agents still miss the why — the decisions, constraints, and intent behind the work
Without a structured handoff mechanism, later stages either repeat work, contradict earlier decisions, or miss critical constraints.
The baton is a small, structured JSON object that travels with the workflow session. After each stage, the agent patches it with what happened. Before each stage, the next agent reads it to understand where things stand.
Think of it like a relay race baton — except this one carries notes.
┌──────────┐ baton ┌──────────┐ baton ┌──────────┐
│ Stage 1 │ ──────────► │ Stage 2 │ ──────────► │ Stage 3 │
│ (planner)│ + patch │(implement)│ + patch │ (review) │
└──────────┘ └──────────┘ └──────────┘
The baton has 8 fields, all optional except goal:
| Field | Purpose | Example |
|---|---|---|
goal |
One-sentence objective | "Add OAuth2 login flow" |
current_state |
What's true right now (3-8 bullets) | ["Auth routes created", "Tests passing"] |
decision_log |
Key decisions made (append-only) | ["Chose JWT over sessions for statelessness"] |
open_questions |
Unresolved issues for later stages | ["Should refresh tokens expire?"] |
constraints |
Hard rules discovered during execution | ["Must support OAuth2", "No breaking changes"] |
artifacts |
Lightweight refs to produced outputs | [{id: "art-001", type: "plan", hash: "a3f8..."}] |
work_scope |
Files/modules touched or planned | ["src/auth/login.ts", "tests/auth.test.ts"] |
acceptance |
Tests/checks that must pass | ["All auth tests green", "No regressions"] |
Typical size: 500–2000 tokens. Small enough to always fit in context.
When a workflow starts, the baton is seeded from the feature description:
baton = Baton(
goal="Initialize deenup",
current_state=["Workflow starting"],
)Before each stage, the baton is rendered as markdown and injected into the agent's prompt:
## Baton (Handoff Context)
**Goal:** Initialize deenup
**Current State:**
- Project directory is uninitialized
- No source code or configuration present
- Analysis complete: awaiting tech stack selection
**Recent Decisions:**
- Confirmed project is empty — not a codebase issue
- Restricted analysis to project boundary
**Constraints:**
- Solo developer with zero budget
- Supabase free tier limitations applyAfter executing, the agent returns a baton_patch in its structured output:
{
"baton_patch": {
"current_state": [
"Project documented with CLAUDE.md and README.md",
"Feature roadmap created with 13 prioritized items",
"Tech stack: React Native + Node.js + Supabase"
],
"decision_log": [
"Chose React Native over native development",
"Selected Supabase for zero-cost backend"
],
"work_scope": ["CLAUDE.md", "README.md"]
}
}Patches use JSON merge-patch semantics — only fields present in the patch are updated, and list fields are replaced entirely (not appended). The full baton snapshot after each patch is saved to baton_history for auditing.
The updated baton feeds into the next stage. The cycle continues until the workflow completes.
Here's the baton history from a project initialization workflow:
Stage 1: detect-tech-stack
→ current_state: "Project directory is uninitialized, no source files found"
→ decision_log: "Confirmed project is empty"
→ constraints: "Solo developer, zero budget"
Stage 2: generate-docs
→ current_state: "Project fully documented, 13 features in backlog"
→ decision_log: "Chose React Native, selected Supabase"
→ work_scope: "CLAUDE.md, README.md"
Stage 3: build-knowledge
→ current_state: "Knowledge index built with 53 chunks"
→ decision_log: "Indexed documentation for semantic search"
Stage 4: verify-ready
→ current_state: "Project fully initialized and verified"
→ decision_log: "All verification checks passed"
→ constraints: "Source validation non-negotiable for Islamic content"
Each stage gets the full picture from its predecessors without needing to re-read every artifact.
| Baton | Artifacts | |
|---|---|---|
| Size | ~500-2000 tokens | 1k-100k+ tokens |
| Content | Decisions, state, constraints | Actual code, plans, docs |
| Included | Always (every stage) | Selectively (budget-dependent) |
| Mutability | Patched after each stage | Immutable snapshots |
| Purpose | Why and what matters | What was produced |
The baton tells the next agent what happened and why it matters. Artifacts are the actual work product. The baton's work_scope field even guides which artifacts get included in context — it's metadata that drives artifact selection.
Append-only decision log. Decisions are never overwritten. Later stages can add new decisions but can't erase earlier ones. This prevents context drift.
Replace semantics for lists. When a stage patches current_state, it replaces the entire list rather than appending. This keeps the state concise — reflecting current truth, not accumulated history. The full history lives in baton_history.
Budget-friendly. At ~1000 tokens, the baton always fits in context. Artifacts compete for token budget; the baton doesn't.
Audit trail. Every patch is recorded with timestamp and stage ID in baton_history. You can replay exactly how understanding evolved across the workflow.
Lightweight artifact refs. The baton carries artifact references (ID + hash), not content. This lets agents know what exists without bloating the handoff.