48Nauts-Operator/2026-01-30-safety-net-gist.md

## 2026-01-30-safety-net-gist.md

      
    Raw
  

              2026-01-30-safety-net-gist.md
            
          
    🧪 [Experimental] AI Assistant Safety Nets: Self-Healing Scripts for Autonomous Agents


📦 Full repo with install instructions: https://github.com/48Nauts-Operator/ai-safety-net

When you give an AI assistant access to modify its own config... what happens when it breaks itself?
The problem: Your AI assistant restarts the gateway, config is bad, and now it's dead. No assistant to fix itself.
The solution: Safety nets that run outside the assistant.
The Architecture

┌─────────────────────────────────────────────┐
│  AI Assistant (Clawdbot/Jarvis)             │
│  - Can modify config                        │
│  - Can restart gateway                      │
│  - Can break itself                         │
└─────────────────────────────────────────────┘
         │
         │ Before risky operation:
         │ spawns external safety net
         ▼
┌─────────────────────────────────────────────┐
│  Safety Net (detached bash process)         │
│  - Waits 5 minutes                          │
│  - Checks if assistant is healthy           │
│  - If dead → attempts recovery              │
│  - If recovery fails → calls Claude API     │
│  - Always alerts human                      │
└─────────────────────────────────────────────┘

Scripts

1. safety-net.sh — On-demand guardian

#!/bin/bash
# Spawns before risky operations, checks back after delay
# Usage: safety-net.sh [delay_minutes] [context]

DELAY_MINUTES="${1:-5}"
CONTEXT="${2:-Unspecified operation}"

check_gateway() {
    if pgrep -f "clawdbot" > /dev/null; then
        curl -s --max-time 10 "http://localhost:3000/health" > /dev/null 2>&1
        return $?
    fi
    return 1
}

attempt_recovery() {
    # Try restart
    clawdbot gateway stop 2>/dev/null
    sleep 3
    clawdbot gateway start &
    sleep 10
    
    if check_gateway; then
        return 0
    fi
    
    # Try restore last good config
    if [ -f "$HOME/clawd/backups/config/config.yaml.lastgood" ]; then
        cp "$HOME/clawd/backups/config/config.yaml.lastgood" \
           "$HOME/.config/clawdbot/config.yaml"
        clawdbot gateway restart
        sleep 10
        check_gateway
        return $?
    fi
    
    return 1
}

call_claude_for_help() {
    # If all else fails, call Claude API directly for diagnosis
    curl -s https://api.anthropic.com/v1/messages \
        -H "x-api-key: $ANTHROPIC_API_KEY" \
        -H "anthropic-version: 2023-06-01" \
        -H "Content-Type: application/json" \
        -d "{
            \"model\": \"claude-opus-4-5-20250514\",
            \"max_tokens\": 2048,
            \"messages\": [{
                \"role\": \"user\",
                \"content\": \"Assistant gateway dead after: $CONTEXT. Logs: $(tail -50 ~/clawd/logs/gateway.log). Diagnose and fix.\"
            }]
        }"
}

main() {
    echo "Safety net active. Checking in $DELAY_MINUTES minutes..."
    sleep $((DELAY_MINUTES * 60))
    
    if check_gateway; then
        echo "Gateway healthy ✓"
        exit 0
    fi
    
    echo "Gateway DOWN. Attempting recovery..."
    if attempt_recovery; then
        echo "Recovery successful ✓"
        exit 0
    fi
    
    echo "Recovery failed. Calling Claude..."
    call_claude_for_help
}

# Run detached from caller
main "$@" &
disown $!
2. preflight.sh — Pre-operation checklist

#!/bin/bash
# Run before any risky operation
# Usage: preflight.sh "Description of what I'm doing"

DESCRIPTION="${1:-Unspecified operation}"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)

echo "🛫 PRE-FLIGHT CHECKLIST"

# 1. Backup current config
mkdir -p ~/clawd/backups/config
cp ~/.config/clawdbot/config.yaml \
   ~/clawd/backups/config/config.yaml.$TIMESTAMP
cp ~/.config/clawdbot/config.yaml \
   ~/clawd/backups/config/config.yaml.lastgood
echo "✅ Config backed up"

# 2. Log to changelog
echo "### $(date +%H:%M) — $DESCRIPTION" >> ~/clawd/CHANGELOG.md
echo "**Status:** 🔄 IN PROGRESS" >> ~/clawd/CHANGELOG.md
echo "✅ Logged to CHANGELOG.md"

# 3. Spawn safety net
~/clawd/scripts/safety-net.sh 5 "$DESCRIPTION"
echo "✅ Safety net active (5 min)"

echo ""
echo "Ready. Proceed with: $DESCRIPTION"
3. watchdog.sh — Background monitor (cron)

#!/bin/bash
# Runs every 5 minutes via cron
# */5 * * * * ~/clawd/scripts/watchdog.sh

FAIL_FILE="/tmp/clawdbot-watchdog-fails"

if ! curl -s --max-time 10 "http://localhost:3000/health" > /dev/null; then
    FAILS=$(($(cat "$FAIL_FILE" 2>/dev/null || echo 0) + 1))
    echo $FAILS > "$FAIL_FILE"
    
    if [ $FAILS -ge 3 ]; then
        # 3 consecutive failures - attempt recovery
        clawdbot gateway restart
        # Alert human
        osascript -e 'display notification "Gateway was down, attempted restart" with title "JARVIS Watchdog"'
    fi
else
    rm -f "$FAIL_FILE"
fi
The Workflow


AI assistant wants to modify config
Runs preflight.sh "Config update for X"
Preflight backs up config, spawns safety net
AI does the risky thing
If it works → safety net expires, all good
If it breaks → safety net wakes up, recovers, alerts human

Why This Matters

As AI assistants get more autonomous, they need self-healing capabilities. But you can't heal yourself if you're dead.
The solution: external watchdogs that survive the assistant's death.
Key principles:

Safety net runs outside the main process
Always backup before changes
Log everything to a changelog
Recovery Claude has context (what was changed)
Always alert the human


Built for Clawdbot but the pattern applies to any AI assistant.
@andrewolke | @21nauts
No results found