E2E test harness for Claude Code. Drives real Claude sessions in tmux panes and asserts on their behavior.
claudetests/
├── e2e.toml # Config: claude binary path + config dir
├── .gitignore # Ignores .test-config/
├── hook-inventory.md # Reference doc: yolo mode & permission hooks
│
├── lib/
│ └── e2e_harness.py # Core harness library (476 lines)
│
├── hooks/
│ └── session-state.py # Notification hook → JSONL state files
│
├── scripts/
│ ├── setup-auth.py # Manual auth: tmux pane for human OAuth
│ └── claude-auth-login-agent-browser.py # Automated auth: pexpect + agent-browser + local callback replay
│
├── tests/
│ └── e2e/
│ └── test_noop.py # Smoke test: launch → idle → teardown
│
└── .test-config/ # Isolated Claude config (gitignored)
Public API:
| Function | What it does |
|---|---|
create_pane(name) |
Splits tmux, creates temp workdir with git init + settings.local.json hook config, seeds workspace trust |
launch_claude(pane_id) |
Sends claude --dangerously-skip-permissions with CLAUDE_CONFIG_DIR, waits for first idle |
send_prompt(pane_id, text) |
Types prompt into pane, waits for idle, returns captured output |
wait_for_idle(pane_id) |
Primary: polls JSONL session-state files for idle_prompt. Fallback: tmux regex (looks for % ctx + ❯ prompt, no spinner). 2-poll debounce at 0.5s |
check_auth() |
Checks claude auth status; if not logged in, runs tmux-based OAuth flow with agent-browser |
capture_pane(pane_id) |
Reads tmux pane content |
kill_pane(pane_id) |
Cleanup |
Config bootstrapping force-merges hasCompletedOnboarding, theme: dark, autoUpdates: false, isTrusted into .test-config/.claude.json on every launch so Claude never shows onboarding dialogs.
Registered per-test as a project-local Notification hook. Appends JSONL records ({ts, type, session_id, message, tmux_pane}) to ~/.local/state/claude/session-state/{session_id}. This is how the harness detects idle without scraping the terminal.
The sophisticated auth script (308 lines, committed). Uses pexpect to run claude auth login in a PTY, captures the OAuth URL, intercepts the BROWSER env var with a temp shell helper to grab the auto-auth URL Claude opens, opens that in agent-browser, clicks consent, polls browser navigation entries for the localhost/callback redirect, extracts code/state params, uses lsof to find Claude's local listener port, then replays the callback via curl. The whole OAuth handshake without a human.
Opens a tmux pane running claude auth login so a human can complete the flow manually.
check_auth() → create_pane("noop") → launch_claude() → assert "% ctx" in output → kill_pane(). 180s SIGALRM hard timeout.
Catalogs the yolo-mode permission hook stack from the main arthack setup (6 layers: settings foundation → yolo auto-allow → protection hooks → ExitPlanMode auto-approval → side effects → command templates). Context documentation, not test infrastructure.
- 4 commits on
main(1 unpushed:f6ca958— the pexpect auth script) docs/e2e-test-harness.mddeleted (design spec superseded by implementation)- Most code is untracked:
.gitignore,e2e.toml,hooks/,lib/,scripts/setup-auth.py,tests/ - Only committed code beyond docs:
scripts/claude-auth-login-agent-browser.py - No
pyproject.toml— all scripts use inlineuv run --scriptdependency metadata - One test exists. No test runner config.