ArnaudRinquin/self-healing-pr.md

## self-healing-pr.md

      
    Raw
  

              self-healing-pr.md
            
          
    Self-Healing PR — Auto-fix CI failures with Claude Code

When a PR's CI checks fail, Claude Code automatically analyzes the error logs, fixes the code, and pushes a commit. The fix triggers a new CI run. If it still fails, it stops (max 1 attempt per human push).
How it works

Human pushes → CI fails → Self-healing triggers → Claude analyzes logs → Fixes code → Pushes commit → CI re-runs
                                                                                                          ↓
                                                                                          If still fails → stops (bot-author check)

Setup

1. Create a GitHub App

Go to your org settings → Developer settings → GitHub Apps → New GitHub App

Name: e.g. my-self-healing-bot
Homepage URL: your repo URL
Webhook: uncheck "Active" (not needed)
Permissions:

Contents: Read & Write (push commits)
Pull requests: Read & Write (comment on PRs)
Actions: Read (read failure logs)


Where can this app be installed?: Only on this account

After creation:

Note the App ID (shown on the app's General page)
Generate a Private key (downloads a .pem file)
Click Install App → select your org → grant access to the repo

2. Add secrets to your repo

Go to repo Settings → Secrets and variables → Actions → New repository secret:


Secret name
Value


SELF_HEALING_APP_ID
The numeric App ID


SELF_HEALING_APP_PRIVATE_KEY
Full contents of the .pem file


You also need a Claude Code OAuth token. Run /install-github-app in Claude Code CLI, or see claude-code-action docs.


Secret name
Value


CLAUDE_CODE_OAUTH_TOKEN
From Claude Code CLI setup


3. Add the workflow

Create .github/workflows/self-healing.yml (see below).
Important: workflow_run only triggers from the default branch. Merge the workflow file to main before it activates.
4. Customize


Change workflows: ["PR Checks"] to match your CI workflow name(s)
Adapt the prompt section with your project's check/test commands
Adapt the claude_args allowed tools to match your stack
Add setup steps for your stack (Node, Python, Go, etc.) before the Claude Code step

Why a GitHub App token?

Commits made with the default GITHUB_TOKEN do not trigger new workflow runs (GitHub's infinite loop prevention). A GitHub App token bypasses this, so the fix commit properly re-triggers CI.
Safety guards


Guard
How


Max 1 attempt
Skips if last commit author contains [bot]


Opt-out per PR
Add no-autofix label


Draft PRs
Skipped


Concurrency
One healing run per PR at a time


Timeout
15 minutes max for Claude


Scoped tools
Only safe git operations (no force push/reset)


Gotchas we learned the hard way


Don't use a container image (e.g. Playwright) — gh CLI won't be available. Use a bare runner.
--allowed-tools must include file tools — Edit, Write, Read, Glob, Grep. Without these, Claude can identify the fix but can't apply it.
CI logs are noisy — runner setup, env vars, etc. Use tail -n 500 to keep only the end where actual errors are.
workflow_run only reads the workflow from the default branch — you must merge to main first. Use workflow_dispatch for testing before that.


The workflow

name: Self-Healing PR

on:
  workflow_run:
    # ⚠️ Change these to match YOUR CI workflow names
    workflows: ["PR Checks"]
    types: [completed]
  workflow_dispatch:
    inputs:
      pr_number:
        description: "PR number to fix"
        required: true
      run_id:
        description: "Failed workflow run ID"
        required: true

concurrency:
  group: self-healing-${{ github.event.workflow_run.pull_requests[0].number || inputs.pr_number }}
  cancel-in-progress: true

jobs:
  auto-fix:
    if: >
      github.event_name == 'workflow_dispatch' || (
        github.event.workflow_run.conclusion == 'failure' &&
        github.event.workflow_run.event == 'pull_request' &&
        github.event.workflow_run.pull_requests[0] != null
      )
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: write
      pull-requests: write
      id-token: write
    steps:
      - name: Generate GitHub App token
        id: app-token
        uses: actions/create-github-app-token@v2
        with:
          app-id: ${{ secrets.SELF_HEALING_APP_ID }}
          private-key: ${{ secrets.SELF_HEALING_APP_PRIVATE_KEY }}

      - name: Get PR info
        id: pr
        env:
          GH_TOKEN: ${{ steps.app-token.outputs.token }}
        run: |
          PR_NUMBER=${{ github.event.workflow_run.pull_requests[0].number || inputs.pr_number }}
          echo "number=$PR_NUMBER" >> $GITHUB_OUTPUT

          PR_JSON=$(gh api repos/${{ github.repository }}/pulls/$PR_NUMBER)
          echo "draft=$(echo "$PR_JSON" | jq -r '.draft')" >> $GITHUB_OUTPUT
          echo "head_ref=$(echo "$PR_JSON" | jq -r '.head.ref')" >> $GITHUB_OUTPUT
          echo "head_sha=$(echo "$PR_JSON" | jq -r '.head.sha')" >> $GITHUB_OUTPUT

      - name: Check safety guards
        id: guards
        env:
          GH_TOKEN: ${{ steps.app-token.outputs.token }}
        run: |
          # Skip draft PRs
          if [ "${{ steps.pr.outputs.draft }}" = "true" ]; then
            echo "skip=true" >> $GITHUB_OUTPUT
            echo "reason=Draft PR" >> $GITHUB_OUTPUT
            exit 0
          fi

          # Skip if last commit was by the bot (max 1 attempt per human push)
          LAST_AUTHOR=$(gh api repos/${{ github.repository }}/commits/${{ steps.pr.outputs.head_sha }} \
            --jq '.author.login // .commit.author.name')
          if echo "$LAST_AUTHOR" | grep -q "\[bot\]"; then
            echo "skip=true" >> $GITHUB_OUTPUT
            echo "reason=Last commit by bot ($LAST_AUTHOR)" >> $GITHUB_OUTPUT
            exit 0
          fi

          # Skip if PR has no-autofix label
          HAS_LABEL=$(gh api repos/${{ github.repository }}/pulls/${{ steps.pr.outputs.number }} \
            --jq '[.labels[].name] | map(select(. == "no-autofix")) | length')
          if [ "$HAS_LABEL" -gt 0 ]; then
            echo "skip=true" >> $GITHUB_OUTPUT
            echo "reason=Has no-autofix label" >> $GITHUB_OUTPUT
            exit 0
          fi

          echo "skip=false" >> $GITHUB_OUTPUT

      - name: Skip notification
        if: steps.guards.outputs.skip == 'true'
        run: echo "Skipped self-healing — ${{ steps.guards.outputs.reason }}"

      - name: Collect failure logs
        if: steps.guards.outputs.skip != 'true'
        id: logs
        env:
          GH_TOKEN: ${{ steps.app-token.outputs.token }}
        run: |
          RUN_ID=${{ github.event.workflow_run.id || inputs.run_id }}

          FAILED_JOBS=$(gh api repos/${{ github.repository }}/actions/runs/$RUN_ID/jobs \
            --jq '[.jobs[] | select(.conclusion == "failure") | .name] | join(", ")')
          echo "failed_jobs=$FAILED_JOBS" >> $GITHUB_OUTPUT

          # Keep only last 500 lines — actual errors are at the end, runner noise at the top
          gh run view $RUN_ID --repo ${{ github.repository }} --log-failed 2>&1 | tail -n 500 > /tmp/failed-logs.txt

          {
            echo 'logs<<EOF_FAILURE_LOGS_8f3a92bc'
            cat /tmp/failed-logs.txt
            echo 'EOF_FAILURE_LOGS_8f3a92bc'
          } >> $GITHUB_OUTPUT

      - name: Checkout PR branch
        if: steps.guards.outputs.skip != 'true'
        uses: actions/checkout@v4
        with:
          ref: ${{ steps.pr.outputs.head_ref }}
          token: ${{ steps.app-token.outputs.token }}

      # ============================================================
      # ⚠️ ADD YOUR PROJECT SETUP STEPS HERE
      # Examples:
      #   - uses: actions/setup-node@v4
      #   - run: npm ci
      #   - uses: actions/setup-python@v5
      #   - uses: actions/setup-go@v5
      # Claude needs the same tools your CI uses to verify fixes.
      # ============================================================

      - name: Run Claude Code
        if: steps.guards.outputs.skip != 'true'
        timeout-minutes: 15
        id: claude
        uses: anthropics/claude-code-action@v1
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          prompt: |
            You are fixing CI failures on PR #${{ steps.pr.outputs.number }} (branch: ${{ steps.pr.outputs.head_ref }}).

            FAILED WORKFLOW: ${{ github.event.workflow_run.name || 'manual trigger' }}
            FAILED JOBS: ${{ steps.logs.outputs.failed_jobs }}

            FAILURE LOGS:
            ```
            ${{ steps.logs.outputs.logs }}
            ```

            INSTRUCTIONS:
            1. Analyze the failure logs above
            2. Identify the root cause (lint errors, type errors, test failures)
            3. Fix the issues in the source code
            4. Run the relevant check locally to verify your fix
            5. If the fix works, commit and push:
               - `git add <changed files>`
               - `git commit -m "fix: auto-fix CI — <brief description>"`
               - `git push`

            RULES:
            - Only fix what the CI logs show as broken. Do NOT refactor or improve unrelated code.
            - If you cannot confidently fix the issue, do NOT commit. Output what you found instead.
            - Never modify test expectations to make tests pass — fix the source code.

          # ⚠️ Adapt allowed tools to your stack
          claude_args: '--allowed-tools "Edit,Write,Read,Glob,Grep,Bash(git add:*),Bash(git commit:*),Bash(git push),Bash(git status),Bash(git diff:*),Bash(gh:*),Bash(npm:*),Bash(npx:*),Bash(node:*)"'

      - name: Comment on PR
        if: steps.guards.outputs.skip != 'true' && always()
        env:
          GH_TOKEN: ${{ steps.app-token.outputs.token }}
        run: |
          INITIAL_SHA="${{ steps.pr.outputs.head_sha }}"
          CURRENT_SHA=$(gh api repos/${{ github.repository }}/pulls/${{ steps.pr.outputs.number }} --jq '.head.sha')
          RUN_URL="${{ github.event.workflow_run.html_url || format('https://github.com/{0}/actions/runs/{1}', github.repository, inputs.run_id) }}"

          if [ "$CURRENT_SHA" != "$INITIAL_SHA" ]; then
            NEW_AUTHOR=$(gh api repos/${{ github.repository }}/commits/$CURRENT_SHA --jq '.author.login // .commit.author.name')
            if echo "$NEW_AUTHOR" | grep -q "\[bot\]"; then
              gh pr comment ${{ steps.pr.outputs.number }} --repo ${{ github.repository }} --body \
                "🩹 **Self-healing**: pushed a fix for CI failures from [this run]($RUN_URL). Please review the changes."
            fi
          else
            gh pr comment ${{ steps.pr.outputs.number }} --repo ${{ github.repository }} --body \
              "⚠️ **Self-healing**: analyzed CI failures from [this run]($RUN_URL) but could not auto-fix. Manual intervention needed."
          fi
Secret name	Value
`SELF_HEALING_APP_ID`	The numeric App ID
`SELF_HEALING_APP_PRIVATE_KEY`	Full contents of the `.pem` file
Guard	How
Max 1 attempt	Skips if last commit author contains `[bot]`
Opt-out per PR	Add `no-autofix` label
Draft PRs	Skipped
Concurrency	One healing run per PR at a time
Timeout	15 minutes max for Claude
Scoped tools	Only safe git operations (no force push/reset)