This report distills the development style used in https://github.com/mikearnaldi/accountability into a reproducible playbook you can apply in your own repositories. It focuses on the workflow, artifacts, automation, and guardrails that make the "Ralph" agent loop practical and consistent.
- Spec-first execution. Work is defined in
specs/documents, each with tasks/phases and status. The agent is expected to read specs, pick a task, implement, and update the spec. - Agent loop orchestration. A root script (
ralph-auto.sh) runs the agent, feeds it a focused prompt, runs CI checks, and commits. The agent does not commit; the script does. - Strict CI gating. The loop requires green
typecheck,lint,build,test(and optionally E2E) before commit. CI mirrors this with separate build and E2E jobs. - Full-stack alignment.
CLAUDE.mdmandates backend and frontend changes stay aligned and forbids frontend-only shortcuts. - Heavy testing and regression discipline. Frequent E2E fixes and explicit test counts in commits indicate an emphasis on stabilizing tests and tracking coverage.
- Strong guardrails. Domain-specific lint rules and best-practice docs prevent certain classes of mistakes (e.g.,
any, directfetch, localStorage).
ralph-auto.shimplements the agent loop and auto-commit behavior.RALPH_AUTO_PROMPT.mddefines focus mode, one-task-per-iteration, CI requirements, and completion signals.progress-auto.txtrecords iterations and tasks, showing repeated automated execution.CLAUDE.mdhard-codes architecture rules and prohibits certain tools (Docker)..github/workflows/ci.ymlruns typecheck + tests and Playwright E2E.- Commit history shows a significant number of auto commits:
- Total commits: 437
feat(auto):commits: 81 (~18.5%)Ralph-Auto-Iterationmetadata: 80 commits
Create these files/directories as the minimum foundation.
Purpose: single source of truth for architecture rules, boundaries, and non-negotiables.
In repo: CLAUDE.md (could be AGENTS.md in your repo).
What it contains (based on CLAUDE.md):
- Architecture overview, data flow, package boundaries
- Critical rules (full-stack alignment, no frontend-only hacks, etc.)
- Must-run test commands
- Tool bans (e.g., no Docker in their repo)
- Pointers to specs and best practices
Purpose: canonical tasks and best-practices library.
In repo: specs/
Observed patterns:
- Task specs with phases and checklists (see
specs/E2E_TEST_COVERAGE.md) - Best-practices specs per layer (
EFFECT_BEST_PRACTICES.md,REACT_BEST_PRACTICES.md, etc.) - Architecture guidance (
UI_ARCHITECTURE.md,HTTP_API_TANSTACK.md, etc.)
Purpose: enforce workflow in every agent run.
In repo: RALPH_AUTO_PROMPT.md
Core rules from template:
- Focus mode (only work on the user-specified prompt)
- One task per iteration
- Must update specs
- Must pass CI before signaling completion
- Output signals:
TASK_COMPLETEandNOTHING_LEFT_TO_DO
Purpose: run the agent, gate on CI, commit consistently.
In repo: ralph-auto.sh
Behavior observed:
- Builds prompt by injecting spec list, progress log, and CI errors
- Runs the agent in stream-json mode and filters output
- Checks for
TASK_COMPLETEorNOTHING_LEFT_TO_DO - Runs CI checks before commit
- Auto-commits with standard message and iteration metadata
Purpose: track iteration-level outputs and make history visible.
In repo: progress-auto.txt
Behavior observed:
- Updated by the loop script before committing
- Contains iteration number, timestamp, task summary, status
Purpose: keep the agent aligned with the same checks as CI.
In repo: .github/workflows/ci.yml
Observed setup:
- Job 1: typecheck + unit tests
- Job 2: Playwright E2E
- Mirrors the loop's checks
Purpose: enforce architectural and security constraints in code.
In repo: eslint.config.mjs
Observed custom rules:
- Enforce
.ts/.tsxextensions for relative imports; no extensions for package imports - Ban
disableValidation: true - Ban
sql<Type>\...`` - Prefer
Option.fromNullable - Ban
localStorage - Ban direct
fetch - Ban
window.location.hrefnavigation
- Prereq checks: validate agent CLI, git repo,
specs/,RALPH_AUTO_PROMPT.md. - Initial CI run: establishes baseline; failures are passed into next prompt.
- Iteration loop:
- Build prompt (spec list + focus + progress + CI errors).
- Run agent.
- Parse output for
TASK_COMPLETEorNOTHING_LEFT_TO_DO. - If task complete: run CI, update
progress-auto.txt, commit. - If no explicit completion but code changes exist: run CI and commit as partial.
- Exit: print recent Ralph Auto commits and clean temp logs.
feat(auto): <task summary>
Ralph-Auto-Iteration: <n>
Automated commit by Ralph Auto loop.
Recommendation: preserve this convention in your repos for filtering and auditing.
The style works best when specs are explicit about tasks and status.
Example (from specs/E2E_TEST_COVERAGE.md):
- A "Current State" section with numbers
- A "Coverage Gaps" section with explicit items
- A multi-phase implementation plan with checkboxes
- Detailed test patterns and examples
- Notes on known limitations
Suggested template for your specs:
- Context / background
- Current state (metrics, baseline)
- Goals and non-goals
- Phased task list with checkboxes
- Tests required per phase
- Risks and dependencies
- Status log / completion notes
The repo encodes its rules in both docs and tooling.
CLAUDE.md: hard requirements (full-stack alignment, no Docker, etc.)specs/EFFECT_BEST_PRACTICES.md: typed errors, noany, no casts, nocatchAllCausespecs/TYPESCRIPT_CONVENTIONS.md: no barrels,.tsextensions, no/src/imports
- ESLint custom rules for architecture and security
- TypeScript project references for incremental builds
- CI checks that mirror local gating
Implementation advice:
- Start with docs to define the rules.
- Encode the most important rules in linting.
- Fail CI on lint, typecheck, and tests.
Below is a practical, step-by-step adoption plan.
- Define your project boundaries (backend vs frontend vs shared)
- Write an agent guide (
AGENTS.mdorCLAUDE.md) with hard rules - Add lint, typecheck, and test scripts that can run headless
- Create
specs/with:- Feature specs that include tasks and phases
- Best-practice docs for each layer
- Require that specs be updated when tasks are completed
- Add
RALPH_AUTO_PROMPT.mdwith:- Focus mode
- One task per iteration
- CI gating requirement
- Completion signals
- Add
ralph-auto.shthat:- Builds prompt from spec list + progress + CI errors
- Runs the agent
- Runs CI and commits
- Add
progress-auto.txtand update it automatically
- Add lint rules for your non-negotiables
- Document any forbidden patterns (e.g., direct fetch,
any) - Add an ESLint rule or code mod for each high-risk pattern
- Keep a
specs/E2E_TEST_COVERAGE.mdthat explicitly tracks coverage - Require tests in the loop before completion
- Add E2E to CI (preferably in a dedicated job)
- Vendor reference repos in
repos/for offline lookup - Document common search patterns for engineers and agents
- Write/extend a spec with phases and tests.
- Run:
./ralph-auto.sh "Implement <feature>" --max-iterations 3 - Review auto commits and spec updates.
- Merge when CI is green.
- Add a coverage gap to
specs/E2E_TEST_COVERAGE.md. - Run:
./ralph-auto.sh "Add E2E coverage for <module>" --e2e - Verify new tests and update coverage spec.
- Add a best-practice spec describing the target style.
- Run:
./ralph-auto.sh "Standardize <component> to match spec" - Ensure lint rules enforce the new style.
- Risk: scope creep in agent runs.
- Mitigation: strict focus prompts and one-task-per-iteration rules.
- Risk: flaky E2E tests blocking automation.
- Mitigation: bake stabilization tasks into specs; track flakiness explicitly.
- Risk: spec drift (docs not updated).
- Mitigation: enforce spec updates in prompt and reject completion without it.
- Risk: CI cost too high for every iteration.
- Mitigation: allow a
--max-iterationsflag and optional--e2egating.
- Mitigation: allow a
.
├── AGENTS.md (or CLAUDE.md)
├── RALPH_AUTO_PROMPT.md
├── ralph-auto.sh
├── progress-auto.txt
├── specs/
│ ├── FEATURE_X.md
│ ├── BEST_PRACTICES_BACKEND.md
│ ├── BEST_PRACTICES_FRONTEND.md
│ └── E2E_TEST_COVERAGE.md
├── .github/workflows/ci.yml
└── eslint.config.mjs
- Copy verbatim:
- Prompt structure with focus mode and completion signals
- Auto-commit metadata (
Ralph-Auto-Iteration) - Progress log format
- Adapt:
- CI checks (match your tech stack)
- Lint rules (your domain constraints)
- Spec taxonomy and coverage docs
If you want the same development style, the core requirement is process automation + documentation discipline + strict CI gating. The accountability repo shows that agents can safely auto-commit when tasks are tightly scoped, specs are the source of truth, and guardrails are enforced by tooling.