Skip to content

Instantly share code, notes, and snippets.

@possibilities
Created March 2, 2026 20:14
Show Gist options
  • Select an option

  • Save possibilities/781fe719ebe065600fa350cebbefcf79 to your computer and use it in GitHub Desktop.

Select an option

Save possibilities/781fe719ebe065600fa350cebbefcf79 to your computer and use it in GitHub Desktop.
claudetests repo inventory

claudetests — Repo Inventory

E2E test harness for Claude Code. Drives real Claude sessions in tmux panes and asserts on their behavior.

File Tree

claudetests/
├── e2e.toml                                    # Config: claude binary path + config dir
├── .gitignore                                  # Ignores .test-config/
├── hook-inventory.md                           # Reference doc: yolo mode & permission hooks
│
├── lib/
│   └── e2e_harness.py                          # Core harness library (476 lines)
│
├── hooks/
│   └── session-state.py                        # Notification hook → JSONL state files
│
├── scripts/
│   ├── setup-auth.py                           # Manual auth: tmux pane for human OAuth
│   └── claude-auth-login-agent-browser.py      # Automated auth: pexpect + agent-browser + local callback replay
│
├── tests/
│   └── e2e/
│       └── test_noop.py                        # Smoke test: launch → idle → teardown
│
└── .test-config/                               # Isolated Claude config (gitignored)

Components

lib/e2e_harness.py — The Engine

Public API:

Function What it does
create_pane(name) Splits tmux, creates temp workdir with git init + settings.local.json hook config, seeds workspace trust
launch_claude(pane_id) Sends claude --dangerously-skip-permissions with CLAUDE_CONFIG_DIR, waits for first idle
send_prompt(pane_id, text) Types prompt into pane, waits for idle, returns captured output
wait_for_idle(pane_id) Primary: polls JSONL session-state files for idle_prompt. Fallback: tmux regex (looks for % ctx + prompt, no spinner). 2-poll debounce at 0.5s
check_auth() Checks claude auth status; if not logged in, runs tmux-based OAuth flow with agent-browser
capture_pane(pane_id) Reads tmux pane content
kill_pane(pane_id) Cleanup

Config bootstrapping force-merges hasCompletedOnboarding, theme: dark, autoUpdates: false, isTrusted into .test-config/.claude.json on every launch so Claude never shows onboarding dialogs.

hooks/session-state.py — Idle Detection Bridge

Registered per-test as a project-local Notification hook. Appends JSONL records ({ts, type, session_id, message, tmux_pane}) to ~/.local/state/claude/session-state/{session_id}. This is how the harness detects idle without scraping the terminal.

scripts/claude-auth-login-agent-browser.py — Automated OAuth

The sophisticated auth script (308 lines, committed). Uses pexpect to run claude auth login in a PTY, captures the OAuth URL, intercepts the BROWSER env var with a temp shell helper to grab the auto-auth URL Claude opens, opens that in agent-browser, clicks consent, polls browser navigation entries for the localhost/callback redirect, extracts code/state params, uses lsof to find Claude's local listener port, then replays the callback via curl. The whole OAuth handshake without a human.

scripts/setup-auth.py — Manual Auth Fallback

Opens a tmux pane running claude auth login so a human can complete the flow manually.

tests/e2e/test_noop.py — Smoke Test

check_auth()create_pane("noop")launch_claude() → assert "% ctx" in output → kill_pane(). 180s SIGALRM hard timeout.

hook-inventory.md — Permission System Reference

Catalogs the yolo-mode permission hook stack from the main arthack setup (6 layers: settings foundation → yolo auto-allow → protection hooks → ExitPlanMode auto-approval → side effects → command templates). Context documentation, not test infrastructure.

Git State

  • 4 commits on main (1 unpushed: f6ca958 — the pexpect auth script)
  • docs/e2e-test-harness.md deleted (design spec superseded by implementation)
  • Most code is untracked: .gitignore, e2e.toml, hooks/, lib/, scripts/setup-auth.py, tests/
  • Only committed code beyond docs: scripts/claude-auth-login-agent-browser.py
  • No pyproject.toml — all scripts use inline uv run --script dependency metadata
  • One test exists. No test runner config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment