Skip to content

Instantly share code, notes, and snippets.

@chutch3
Created August 11, 2025 12:08
Show Gist options
  • Select an option

  • Save chutch3/79b105fd4bb4891d74a34ba81240e8d7 to your computer and use it in GitHub Desktop.

Select an option

Save chutch3/79b105fd4bb4891d74a34ba81240e8d7 to your computer and use it in GitHub Desktop.
Strict TDD Workflow Enforcement for AI Agents

This document defines a machine-readable, state-tracked workflow for AI-driven code refactoring using the Test-Driven Development (TDD) Red → Green → Refactor cycle.

It enforces:

  • One branch / one PR per issue
  • Strict RED/GREEN/REFACTOR phases with mandatory relevant test passing at each step
  • Automatic detection and correction of violations (root-cause fixes only, no shortcuts)
  • Explicitly forbidden shortcuts that mask failing tests
  • Iterative cycles within a single work stream, with cycle numbers tracked
  • State logging of every action, commit, test result, and violation for auditability
  • PR descriptions containing a full cycle trace and state log link
  • This is designed for AI agents that must remain fully compliant, auditable, and iterative during software changes, preventing premature phase completion or skipped tests.

I currently use it for curosr rules but you could also use this prompt for claude code

BEGIN REFACTOR PROCEDURE — STRICT TEST ENFORCEMENT

1. PRINCIPLES
   1.1 All work is done per-issue (one issue → one branch → one PR).
   1.2 Follow Test-First TDD exactly: RED → GREEN → REFACTOR.
   1.3 Never take shortcuts that mask failing tests (see disallowed actions).

2. RED / GREEN / REFACTOR (Definitions & Enforcement)
   2.1 RED:
       - Add a new automated test that currently FAILS because the feature/behavior is not implemented or is broken.
       - Commit immediately with a single-line Conventional Commit message that includes the phase tag.
         Example: `test(auth): add failing test for password reset [RED]`
   2.2 GREEN:
       - Implement the minimal code necessary to make the failing test(s) introduced in RED pass.
       - **Requirement:** Before declaring GREEN complete, run the full set of "relevant tests" (see §5). **All** relevant tests must pass.
       - If any relevant test still fails, do not mark GREEN. Follow the auto-correction root-cause steps in §3.
       - Commit immediately after GREEN change with a single-line message that includes the phase tag.
         Example: `fix(auth): implement password reset to satisfy tests [GREEN]`
   2.3 REFACTOR:
       - Improve code quality/structure without changing externally visible behavior.
       - After REFACTOR, **all relevant tests must still pass** before marking REFACTOR complete.
       - Commit with a single-line message including the phase tag.
         Example: `refactor(auth): simplify password reset flow [REFACTOR]`
   2.4 ITERATIVE NATURE OF TDD:
       - Treat RED → GREEN → REFACTOR as a continuous, repeatable cycle.
       - Within the same branch, PR, or work stream, expect to perform multiple TDD cycles for different behaviors, edge cases, or refinements.
       - Each new test that introduces a failing condition (RED) begins a new cycle.
       - After completing GREEN and REFACTOR for that failing test, return to RED for the next desired behavior or scenario.
       - Maintain state log entries for each individual cycle so they can be independently reviewed.

3. VIOLATION HANDLING — AUTO-CORRECTION & ROOT-CAUSE RULES
   3.1 Pre-execution check: before any step, compute relevant tests (see §5) and confirm they are runnable.
   3.2 If a GREEN attempt would leave any relevant test failing:
       - Reproduce the failing test(s) locally and capture the stack trace.
       - Perform root-cause analysis (trace failing assertion → code path → commit/diff) and implement the actual fix that addresses the underlying defect.
       - Re-run relevant tests. Repeat until they pass or the issue is provably unresolvable.
       - Log each attempt and the fix applied.
   3.3 If the agent attempts one of the disallowed shortcuts (see §6), treat this as a violation. Attempt to revert the shortcut and fix root cause. If unable, STOP and emit `UNRESOLVABLE RULE VIOLATION` with details.
   3.4 Outputs on any violation attempt:
       - If auto-fixed: `RULE VIOLATION DETECTED → AUTO-CORRECTED: <step> — <short reason>`
       - If cannot auto-fix: `UNRESOLVABLE RULE VIOLATION: <step> — <reason>`

4. COMMITS / MESSAGES (format + required content)
   4.1 Every commit must be a single line and follow Conventional Commits.
   4.2 Every commit message must include the TDD phase in brackets: `[RED]`, `[GREEN]`, or `[REFACTOR]`.
   4.3 If tests are modified, the state log must include a human-readable justification (see §7). Example commit: `fix(api): correct response shape to satisfy tests [GREEN]`

5. RELEVANT TEST SELECTION (how to determine what must pass)
   5.1 Construct the set of relevant tests for this change by:
       a. Listing files modified in the branch (diff vs base).
       b. Selecting tests that:
          - Import or reference any modified file/module (static-search for imports, symbols).
          - Reside in the same package/directory as modified files.
          - Were added/updated in this branch.
       c. If available, include tests that coverage tools show exercise the modified code.
       d. If uncertain, widen to run all tests in the same module/package.
   5.2 The agent must run the relevant-test set after RED, after GREEN, and after REFACTOR for **each cycle**.
   5.3 It is acceptable to skip running unrelated test suites for speed, **but not** the relevant tests defined above.

6. DISALLOWED SHORTCUTS (explicitly forbidden actions)
   6.1 Never do any of the following to make tests "pass":
       - Commenting out or deleting failing asserts in tests.
       - Adding test skips/xfail markers to suppress failures as a permanent fix.
       - Catching and silencing exceptions in production code purely to hide test failures.
       - Weakening assertions (e.g., changing `== expected` to `in` or `is not None`) to accept incorrect behavior.
       - Changing test expectations to match buggy production behavior without fixing the root cause.
       - Permanently altering CI/test-runner configuration to avoid running the failing tests.
   6.2 Allowed exception: modifying a test is permitted **only** if the test itself is proven incorrect (e.g., wrong expected value or outdated spec). In that case:
       - Document the justification in the state log (see §7).
       - The code causing the test to be wrong must not be unchanged for convenience; the test change must be minimal and clearly justified.

7. STATE TRACKING (mandatory, expanded)
   7.1 After every action update the state log (exportable JSON). Each entry must include:
       - Timestamp.
       - Cycle number.
       - Branch name and issue ID.
       - Action performed (RED/GREEN/REFACTOR/TEST-RUN/COMMIT/PR).
       - Files changed (with diffs or git hashes).
       - Commit hash and single-line commit message.
       - Current phase.
       - List of relevant tests run, results (pass/fail) and full failing stack traces.
       - If a failing test was fixed, record whether the fix was:
           • root-cause code fix, OR
           • test modification (include TEST_CHANGE_JUSTIFICATION text).
       - Any RULE VIOLATION DETECTED entries and correction actions.
   7.2 The branch/PR must link to the state log (e.g., include the path or paste the JSON as part of PR description).

8. PULL REQUEST REQUIREMENTS
   8.1 PR description must include:
       - Which issue it solves.
       - The RED→GREEN→REFACTOR trace for each cycle: failing tests added, code changes made, tests run and pass, and final state log link.
       - Any test-file modifications and their justifications.
   8.2 CI must show the relevant tests passing before merge approval.

9. FAILURE MODES / STOP CONDITIONS
   9.1 If the agent cannot fix a failing relevant test because of:
       - Missing permissions, unavailable external service, flaky external dependency that cannot be isolated, or ambiguous spec requiring human decision,
       then:
         - Emit: `UNRESOLVABLE RULE VIOLATION` with full diagnostic (failing tests, attempted fixes, why unresolvable).
         - STOP and WAIT for human instruction.
   9.2 Do not mark any phase complete while an UNRESOLVABLE RULE VIOLATION exists.

END PROCEDURE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment