Skip to content

Instantly share code, notes, and snippets.

@48Nauts-Operator
Last active January 30, 2026 07:59
Show Gist options
  • Select an option

  • Save 48Nauts-Operator/299f0e26c59259d2ad14c949d7fb44e7 to your computer and use it in GitHub Desktop.

Select an option

Save 48Nauts-Operator/299f0e26c59259d2ad14c949d7fb44e7 to your computer and use it in GitHub Desktop.
AI Assistant Safety Nets: Self-Healing Scripts for Autonomous Agents

πŸ§ͺ [Experimental] AI Assistant Safety Nets: Self-Healing Scripts for Autonomous Agents

πŸ“¦ Full repo with install instructions: https://github.com/48Nauts-Operator/ai-safety-net

When you give an AI assistant access to modify its own config... what happens when it breaks itself?

The problem: Your AI assistant restarts the gateway, config is bad, and now it's dead. No assistant to fix itself.

The solution: Safety nets that run outside the assistant.

The Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  AI Assistant (Clawdbot/Jarvis)             β”‚
β”‚  - Can modify config                        β”‚
β”‚  - Can restart gateway                      β”‚
β”‚  - Can break itself                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚ Before risky operation:
         β”‚ spawns external safety net
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Safety Net (detached bash process)         β”‚
β”‚  - Waits 5 minutes                          β”‚
β”‚  - Checks if assistant is healthy           β”‚
β”‚  - If dead β†’ attempts recovery              β”‚
β”‚  - If recovery fails β†’ calls Claude API     β”‚
β”‚  - Always alerts human                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Scripts

1. safety-net.sh β€” On-demand guardian

#!/bin/bash
# Spawns before risky operations, checks back after delay
# Usage: safety-net.sh [delay_minutes] [context]

DELAY_MINUTES="${1:-5}"
CONTEXT="${2:-Unspecified operation}"

check_gateway() {
    if pgrep -f "clawdbot" > /dev/null; then
        curl -s --max-time 10 "http://localhost:3000/health" > /dev/null 2>&1
        return $?
    fi
    return 1
}

attempt_recovery() {
    # Try restart
    clawdbot gateway stop 2>/dev/null
    sleep 3
    clawdbot gateway start &
    sleep 10
    
    if check_gateway; then
        return 0
    fi
    
    # Try restore last good config
    if [ -f "$HOME/clawd/backups/config/config.yaml.lastgood" ]; then
        cp "$HOME/clawd/backups/config/config.yaml.lastgood" \
           "$HOME/.config/clawdbot/config.yaml"
        clawdbot gateway restart
        sleep 10
        check_gateway
        return $?
    fi
    
    return 1
}

call_claude_for_help() {
    # If all else fails, call Claude API directly for diagnosis
    curl -s https://api.anthropic.com/v1/messages \
        -H "x-api-key: $ANTHROPIC_API_KEY" \
        -H "anthropic-version: 2023-06-01" \
        -H "Content-Type: application/json" \
        -d "{
            \"model\": \"claude-opus-4-5-20250514\",
            \"max_tokens\": 2048,
            \"messages\": [{
                \"role\": \"user\",
                \"content\": \"Assistant gateway dead after: $CONTEXT. Logs: $(tail -50 ~/clawd/logs/gateway.log). Diagnose and fix.\"
            }]
        }"
}

main() {
    echo "Safety net active. Checking in $DELAY_MINUTES minutes..."
    sleep $((DELAY_MINUTES * 60))
    
    if check_gateway; then
        echo "Gateway healthy βœ“"
        exit 0
    fi
    
    echo "Gateway DOWN. Attempting recovery..."
    if attempt_recovery; then
        echo "Recovery successful βœ“"
        exit 0
    fi
    
    echo "Recovery failed. Calling Claude..."
    call_claude_for_help
}

# Run detached from caller
main "$@" &
disown $!

2. preflight.sh β€” Pre-operation checklist

#!/bin/bash
# Run before any risky operation
# Usage: preflight.sh "Description of what I'm doing"

DESCRIPTION="${1:-Unspecified operation}"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)

echo "πŸ›« PRE-FLIGHT CHECKLIST"

# 1. Backup current config
mkdir -p ~/clawd/backups/config
cp ~/.config/clawdbot/config.yaml \
   ~/clawd/backups/config/config.yaml.$TIMESTAMP
cp ~/.config/clawdbot/config.yaml \
   ~/clawd/backups/config/config.yaml.lastgood
echo "βœ… Config backed up"

# 2. Log to changelog
echo "### $(date +%H:%M) β€” $DESCRIPTION" >> ~/clawd/CHANGELOG.md
echo "**Status:** πŸ”„ IN PROGRESS" >> ~/clawd/CHANGELOG.md
echo "βœ… Logged to CHANGELOG.md"

# 3. Spawn safety net
~/clawd/scripts/safety-net.sh 5 "$DESCRIPTION"
echo "βœ… Safety net active (5 min)"

echo ""
echo "Ready. Proceed with: $DESCRIPTION"

3. watchdog.sh β€” Background monitor (cron)

#!/bin/bash
# Runs every 5 minutes via cron
# */5 * * * * ~/clawd/scripts/watchdog.sh

FAIL_FILE="/tmp/clawdbot-watchdog-fails"

if ! curl -s --max-time 10 "http://localhost:3000/health" > /dev/null; then
    FAILS=$(($(cat "$FAIL_FILE" 2>/dev/null || echo 0) + 1))
    echo $FAILS > "$FAIL_FILE"
    
    if [ $FAILS -ge 3 ]; then
        # 3 consecutive failures - attempt recovery
        clawdbot gateway restart
        # Alert human
        osascript -e 'display notification "Gateway was down, attempted restart" with title "JARVIS Watchdog"'
    fi
else
    rm -f "$FAIL_FILE"
fi

The Workflow

  1. AI assistant wants to modify config
  2. Runs preflight.sh "Config update for X"
  3. Preflight backs up config, spawns safety net
  4. AI does the risky thing
  5. If it works β†’ safety net expires, all good
  6. If it breaks β†’ safety net wakes up, recovers, alerts human

Why This Matters

As AI assistants get more autonomous, they need self-healing capabilities. But you can't heal yourself if you're dead.

The solution: external watchdogs that survive the assistant's death.

Key principles:

  • Safety net runs outside the main process
  • Always backup before changes
  • Log everything to a changelog
  • Recovery Claude has context (what was changed)
  • Always alert the human

Built for Clawdbot but the pattern applies to any AI assistant.

@andrewolke | @21nauts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment