AGI SDK Binary Architecture - Claude Code Pattern
This document describes the AGI driver binary and SDK architecture. The driver is a self-contained agent that captures screenshots, reasons with Claude, and executes actions autonomously. SDKs are thin event wrappers.
┌─────────────────────────────────────────────────────────────────────────┐
│ agi-api-driver │
│ (locally: ~/Code/agi-api-driver) │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ SELF-CONTAINED AGENT DRIVER BINARY │ │
│ │ │ │
│ │ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────┐ │ │
│ │ │ Executor │ │ Agent/LLM │ │ Environment │ │ │
│ │ │ • State │ │ • Claude │ │ • Screenshot capture │ │ │
│ │ │ machine │ │ API │ │ • Action execution │ │ │
│ │ │ • Event │ │ • Tools │ │ • Screen size detect │ │ │
│ │ │ emission │ │ • Prompts │ │ • DPI/scale factor │ │ │
│ │ └─────────────┘ └──────────────┘ └────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ ↓ CI compiles & publishes │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ GitHub Releases: agi-driver-v1.0.0-{platform} │ │
│ │ darwin-arm64 | darwin-x64 | linux-x64 | windows-x64 │ │
│ └───────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
↓ ↓ ↓
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ agi-python │ │ agi-node │ │ agi-csharp │
│ │ │ │ │ │
│ THIN WRAPPER │ │ THIN WRAPPER │ │ THIN WRAPPER │
│ • Spawn binary │ │ • Spawn binary │ │ • Spawn binary │
│ • Event hooks │ │ • Event hooks │ │ • Event hooks │
│ • Send commands │ │ • Send commands │ │ • Send commands │
│ (no platform │ │ (no platform │ │ (no platform │
│ code needed) │ │ code needed) │ │ code needed) │
└──────────────────┘ └──────────────────┘ └──────────────────┘
The driver runs autonomously on the local machine:
SDK sends: {"command":"start","goal":"Open calculator","mode":"local"}
Driver: 1. Captures screenshot via Pillow/scrot
2. Calls Claude API with screenshot + goal
3. Emits thinking/action events (informational)
4. Executes actions locally (JXA/PowerShell/xdotool)
5. Waits 0.5s for screen to settle
6. Captures next screenshot
7. Repeat until finished/error
SDK: Just listens to events. No platform code needed.
The SDK manages screenshots and action execution (existing behavior):
SDK sends: {"command":"start","goal":"...","screenshot":"base64...","screen_width":1920,"screen_height":1080}
Driver: 1. Receives screenshot from SDK
2. Calls Claude API
3. Emits action events
SDK: 4. Executes actions
5. Captures screenshot
6. Sends screenshot command
7. Repeat
| Aspect | Before | After |
|---|---|---|
| Screenshot capture | SDK responsibility | Driver captures locally |
| Action execution | SDK responsibility | Driver executes locally |
| Screen detection | SDK responsibility | Driver detects automatically |
| SDK executor code | Required (600+ lines per SDK) | Deprecated (kept for backward compat) |
| SDK role | I/O adapter + executor | Pure event wrapper |
| Platform code | Duplicated in 3 SDKs | Single implementation in driver |
The driver includes a platform-aware environment module:
agi_driver/environment/
├── __init__.py # Factory: create_environment("local")
├── base.py # Abstract: BaseEnvironment
└── local.py # LocalEnvironment - controls local machine
BaseEnvironment interface:
class BaseEnvironment(ABC):
async def initialize(self) -> None
async def capture_screenshot(self) -> tuple[str, int, int] # (base64, width, height)
async def execute_action(self, action: dict) -> bool
async def get_screen_size(self) -> tuple[int, int]
async def cleanup(self) -> NoneLocalEnvironment handles:
- Screenshot:
PIL.ImageGrab.grab()(macOS/Windows),scrot(Linux) - Clicks: JXA/CGEvent (macOS), PowerShell/user32.dll (Windows), xdotool (Linux)
- Typing: JXA with JSON escaping (macOS), Base64+SendKeys (Windows), xdotool (Linux)
- Keys: AppleScript key codes (macOS), SendKeys format (Windows), xdotool (Linux)
- Scroll/Drag: Platform-specific implementations
- DPI/Scale: NSScreen (macOS), Registry (Windows), GDK_SCALE (Linux)
{"event":"ready","version":"0.1.0","protocol":"jsonl","step":0}
{"event":"state_change","state":"running","step":0}
{"event":"screenshot_captured","width":3024,"height":1964,"step":0}
{"event":"thinking","text":"I see the desktop with a dock at the bottom...","step":1}
{"event":"action","action":{"type":"click","x":150,"y":200},"step":1}
{"event":"screenshot_captured","width":3024,"height":1964,"step":1}
{"event":"confirm","action":{},"reason":"Delete this file?","step":2}
{"event":"ask_question","question":"What email should I use?","question_id":"q1","step":3}
{"event":"finished","reason":"completed","summary":"Opened calculator and computed 2+2=4","success":true,"step":10}
{"event":"error","message":"Model inference failed","code":"step_error","recoverable":true,"step":5}New event: screenshot_captured - Emitted in local mode when the driver captures a screenshot. Lightweight notification (no image data) so SDKs know a step boundary occurred.
{"command":"start","session_id":"sess_abc","goal":"Open calculator","mode":"local"}
{"command":"start","session_id":"sess_def","goal":"Click login","screenshot":"base64...","screen_width":1920,"screen_height":1080}
{"command":"screenshot","data":"base64...","screen_width":1920,"screen_height":1080}
{"command":"pause"}
{"command":"resume"}
{"command":"stop","reason":"User cancelled"}
{"command":"confirm","approved":true,"message":""}
{"command":"answer","text":"user@example.com","question_id":"q1"}StartCommand changes:
modefield added:"local"for autonomous,""for legacyscreenshot,screen_width,screen_heightare ignored in local mode
┌─────────────────────┐
│ │
start ▼ │
┌───────┐ ─────────────> ┌─────────────┐ │
│ IDLE │ │ RUNNING │<────────────┤
└───────┘ └─────────────┘ │
│ │
┌───────────────────────┼───────────────────┐ │
│ │ │ │
▼ ▼ ▼ │
┌──────────────┐ ┌───────────────────┐ ┌────────────────┐
│ PAUSED │ │ WAITING_CONFIRM │ │ WAITING_ANSWER │
│ │ │ │ │ │
│ resume() │ │ confirm(bool) │ │ answer(str) │
└──────┬───────┘ └─────────┬─────────┘ └───────┬────────┘
│ │ │
└──────────────────────┴─────────────────────┘
ANY STATE ──── stop() ────> STOPPED
ANY STATE ──── error ─────> ERROR
RUNNING ────── finish ───> FINISHED
agi-api-driver/
├── src/agi_driver/
│ ├── __init__.py # Package exports, version
│ ├── __main__.py # CLI entry point
│ ├── executor.py # Main execution loop (local + legacy modes)
│ ├── state_machine.py # State enum and transitions
│ ├── agent/
│ │ ├── base.py # BaseDriverAgent, AgentAction, StepResult
│ │ ├── desktop_agent.py # DesktopAgent (Claude API integration)
│ │ ├── prompt.py # System prompts
│ │ └── tools.py # Desktop automation tool definitions
│ ├── environment/ # NEW: Self-contained environment
│ │ ├── __init__.py # Factory: create_environment()
│ │ ├── base.py # Abstract BaseEnvironment
│ │ └── local.py # LocalEnvironment (screenshot + actions)
│ ├── llm/
│ │ └── anthropic.py # Claude API client with retry
│ └── protocol/
│ ├── commands.py # 7 command types (start now has mode)
│ ├── events.py # 9 event types (+ screenshot_captured)
│ └── jsonl.py # JSON Lines I/O
├── .github/workflows/
│ └── build-agi-driver.yml # Cross-platform Nuitka build
└── pyproject.toml
from agi import AgentDriver, DriverOptions
driver = AgentDriver(DriverOptions(mode="local"))
driver.on_thinking(lambda t: print(f"Thinking: {t}"))
driver.on_action(lambda a: print(f"Action: {a.type}")) # Informational only
result = await driver.start(goal="Open calculator and compute 2+2")
print(f"Done: {result.summary}")import { AgentDriver } from '@agi/sdk';
const driver = new AgentDriver({ mode: 'local' });
driver.on('thinking', (text) => console.log('Thinking:', text));
driver.on('action', (action) => console.log('Action:', action.type));
const result = await driver.start('Open calculator and compute 2+2');
console.log('Done:', result.summary);using Agi.Driver;
var driver = new AgentDriver(new DriverOptions { Mode = "local" });
driver.OnThinking += async (text) => Console.WriteLine($"Thinking: {text}");
driver.OnAction += async (action) => Console.WriteLine($"Action: {action.Type}");
var result = await driver.StartAsync(goal: "Open calculator and compute 2+2");
Console.WriteLine($"Done: {result.Summary}");cd src
python -m nuitka \
--standalone --onefile \
--output-filename=agi-driver \
--include-package=agi_driver \
--include-package=agi_driver.environment \
--include-package=anthropic \
--include-package=PIL \
--include-package=pydantic \
--lto=yes \
--python-flag=no_site \
-m agi_driver| OS | Target | Binary |
|---|---|---|
| macOS 14 | darwin-arm64 | agi-driver-darwin-arm64 |
| macOS 13 | darwin-x64 | agi-driver-darwin-x64 |
| Ubuntu 22.04 | linux-x64 | agi-driver-linux-x64 |
| Windows latest | windows-x64 | agi-driver-windows-x64.exe |
anthropic- Claude API clientPillow- Screenshot capturepydantic- Data validation
┌────────────────────────────────────────────────────────────────┐
│ AUTONOMOUS LOOP │
│ │
│ 1. Initialize LocalEnvironment │
│ - Detect screen size (system_profiler / powershell / xdpy) │
│ - Cache DPI scale factor │
│ │
│ 2. Capture initial screenshot (PIL.ImageGrab / scrot) │
│ └── Emit screenshot_captured event │
│ │
│ 3. Call agent.step(screenshot, goal) │
│ ├── Prepare image (resize to 1366x768 canvas, JPEG 85) │
│ ├── Build messages (goal + history + screenshot) │
│ ├── Call Claude API with desktop tools │
│ └── Process response (thinking, tool uses) │
│ │
│ 4. Emit thinking event │
│ │
│ 5. Check control flow: │
│ ├── finish → Emit finished, exit loop │
│ ├── confirm → Emit confirm, wait for stdin response │
│ ├── ask_question → Emit ask_question, wait for stdin │
│ └── actions → Continue to step 6 │
│ │
│ 6. Execute actions on environment │
│ ├── Emit action event (informational) │
│ └── Call environment.execute_action() │
│ │
│ 7. Wait 0.5s settle delay │
│ │
│ 8. Capture next screenshot │
│ └── Emit screenshot_captured event │
│ │
│ 9. Go to step 3 │
│ │
│ Background: stdin reader queues pause/stop/confirm/answer │
│ commands for processing between steps │
└────────────────────────────────────────────────────────────────┘
The following SDK-side executor modules have been removed. The driver binary handles all screenshot capture and action execution in local mode:
| SDK | Removed Module |
|---|---|
| Python | agi.executor (execute_action, execute_actions, get_scale_factor, get_screen_size) |
| Node.js | src/executor.ts (executeAction, executeActions, getScaleFactor, getScreenSize) |
| C# | Agi.Executor (ExecuteAction, ExecuteActions, GetScaleFactor, GetScreenSize) |
All platform-specific code now lives exclusively in the driver binary's environment/local.py.