thedudeabidesai/securing-openclaw-guide.md

## securing-openclaw-guide.md

      
    Raw
  

              securing-openclaw-guide.md
            
          
    Deploying & Securing OpenClaw on Hetzner

A Complete Production Guide — Secure From Step One


Guide version: 2.0 — February 7, 2026
Last reviewed: 2026-02-07 | Lines: ~1125 | Grade: Multi-model audited (Opus 4.6, Codex 5.3, Grok 3)

Based on Brad Barbin's original Hetzner deployment gist. Security hardening from a real production audit by The Dude 🎳.

Platform: Written for Hetzner VPS (Ubuntu 24.04/22.04 or Debian 12), but the security principles and Docker-based deployment apply to any Linux host. macOS-specific notes are called out where relevant.


How to Read This Guide

Every major section has three parts:

🧒 Child Lens — A simple analogy. If you can't explain it to a kid, you don't understand it.
🔬 First Principles Lens — What's actually at risk. No security theater.
Commands — Copy-paste ready.

This isn't theoretical. Every security item came from auditing a real OpenClaw deployment — including a backup system that had been silently failing for days.

Architecture Overview

Laptop ──SSH tunnel──▶ Hetzner VPS (127.0.0.1:18789) ──▶ Docker container (:18789)
                            │
Tailnet devices ────────────┘ (host-level Tailscale proxy)

Key rules:

Gateway listens on container port 18789 — this never changes
Docker publishes to host port ${OPENCLAW_GATEWAY_PORT} — this can vary
Host binding stays loopback-only (127.0.0.1) unless you explicitly need remote exposure
Access via SSH tunnel or Tailscale — never expose directly to the internet

This guide cross-checks the live repo:

Dockerfile uses Bun and runs as USER node
Default image CMD is node openclaw.mjs gateway --allow-unconfigured
docker-compose.yml keeps container gateway port fixed at 18789


Prerequisites


Hetzner VPS (Ubuntu 24.04/22.04 or Debian 12)
Root SSH access
Domain/TLS optional (recommended if exposing beyond loopback/tailnet)
OpenClaw repo available on the host


1. Provision and Baseline Hardening

🧒 Child Lens: Before you put anything in your new house, you lock the doors, install smoke detectors, and check the windows. Don't move in first and secure later.
🔬 First Principles Lens: A fresh VPS has SSH open to the internet and no firewall. Every minute it's exposed unpatched is a minute attackers can probe it. Baseline hardening reduces the attack surface before you install anything worth stealing.
SSH in and update


This is the only time you SSH as root. After creating the deploy user below, all subsequent commands use deploy@YOUR_VPS_IP with sudo.

ssh root@YOUR_VPS_IP

apt-get update
apt-get -y upgrade
apt-get install -y --no-install-recommends \
  ca-certificates curl gnupg ufw fail2ban unattended-upgrades jq
Automatic security updates

dpkg-reconfigure -plow unattended-upgrades
Firewall — before anything else

ufw default deny incoming
ufw default allow outgoing
ufw allow OpenSSH
ufw --force enable
ufw status verbose

⚠️ The SSH rule matters. Enabling the firewall without allowing SSH first means you just locked yourself out. There is no "undo" button from outside. We've seen deployment guides skip this step.

SSH hardening (recommended)

First, create a non-root deploy user (all remaining commands use this user via sudo):
adduser deploy
usermod -aG sudo deploy
# Copy your SSH key to the new user
mkdir -p /home/deploy/.ssh
cp ~/.ssh/authorized_keys /home/deploy/.ssh/
chown -R deploy:deploy /home/deploy/.ssh
chmod 700 /home/deploy/.ssh && chmod 600 /home/deploy/.ssh/authorized_keys
Then disable password auth and root login in /etc/ssh/sshd_config:
PasswordAuthentication no
PermitRootLogin no
AllowUsers deploy

Then systemctl restart sshd.

Test before disconnecting! Open a second terminal and verify ssh deploy@YOUR_VPS_IP works before closing your root session. If it fails, you still have the root session to fix it.


Key types: Use Ed25519 keys (ssh-keygen -t ed25519). RSA works but Ed25519 is shorter, faster, and has no known weaknesses. Changing the SSH port (e.g., 2222) reduces log noise from bots but is not a security measure — don't rely on it.

Time: 5 minutes. Impact: Massive.
fail2ban configuration

fail2ban was installed above but needs activation. Enable the SSH jail:
cat > /etc/fail2ban/jail.local <<'EOF'
[sshd]
enabled = true
port = ssh
maxretry = 5
bantime = 3600
findtime = 600
EOF
systemctl enable --now fail2ban
fail2ban-client status sshd  # verify it's running
Host intrusion detection (optional but recommended)

For detecting unauthorized file changes on the host:
apt install -y aide
aideinit  # generates initial database (takes a few minutes)
# Run daily check via cron:
echo '0 3 * * * root /usr/bin/aide --check' > /etc/cron.d/aide-check

🔬 First Principles Lens: fail2ban rate-limits brute-force attempts; AIDE detects if someone modifies system files after gaining access. Together they cover both the "getting in" and "already in" attack phases.


2. Install Docker Engine (Signed Repository)

🧒 Child Lens: When you install an app, you want to make sure it came from the real store, not a fake one. Using Docker's signed repo is like checking the store's ID badge.
🔬 First Principles Lens: curl | sh downloads and executes in one step — HTTPS provides transport integrity, but you never inspect what you're running. A compromised server or CDN serves you malware and you execute it blindly. GPG-signed apt repos let the package manager verify the package hasn't been tampered with before installing — you can also inspect what you're getting.
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg

. /etc/os-release
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu ${VERSION_CODENAME} stable" \
  > /etc/apt/sources.list.d/docker.list

apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

docker --version
docker compose version
For Debian, replace ubuntu in the repo URL with debian.

3. Clone OpenClaw and Prepare Persistent Directories

🧒 Child Lens: You're building a house (the container) on a foundation (the host). The stuff you care about — photos, documents — lives in the foundation, not the house. If the house burns down, you rebuild it. The foundation stays.
🔬 First Principles Lens: Containers are ephemeral. Any state not on a mounted volume is lost on recreate. File ownership must match the container's runtime user (node, UID 1000). Permission 700 means only the owner can read the directory — other users on the host can't peek at your config or secrets.
git clone https://github.com/openclaw/openclaw.git
cd openclaw

mkdir -p /home/deploy/.openclaw /home/deploy/.openclaw/workspace
chown -R 1000:1000 /home/deploy/.openclaw /home/deploy/.openclaw/workspace
chmod 700 /home/deploy/.openclaw /home/deploy/.openclaw/workspace

The 1000:1000 ownership matches USER node in the container image. Verify after building: docker compose run --rm openclaw-gateway id — expect uid=1000(node) gid=1000(node).

Git secrets audit — do this now

Before you start committing anything to this repo, set up your .gitignore:
cat >> .gitignore <<'EOF'
# Secrets — never commit these
auth-profiles.json
*.env
.env
discord-history/
EOF
Audit for any secrets already in history:
git log --all --diff-filter=A --name-only --pretty=format: | sort -u | grep -iE 'token|secret|key|password|auth|\.env'
If you find anything, scrub it:
# Install the tool
pip install git-filter-repo  # or: apt install git-filter-repo

# Remove a file from all history
git filter-repo --path auth-profiles.json --invert-paths --force

⚠️ git filter-repo rewrites ALL commit hashes. Existing clones and forks will diverge. Only use on repos you fully control, and force-push after.


What we actually found: A discord-history/ directory with message dumps committed to a workspace repo. Scrubbed it from all history. The content wasn't catastrophic, but the habit is — next time it could be API keys.


4. Create .env with Strict Permissions

🧒 Child Lens: Your .env file is like a keychain with all your house keys, car keys, and safe combination on it. You don't leave it on the front porch — you keep it in your pocket, and only you can reach it.
🔬 First Principles Lens: The .env file contains bearer credentials. Anyone who reads it IS you from the provider's perspective. chmod 600 means only the file owner can read it. Never commit it. Never paste its contents in Discord or Slack — those are cloud services with message history, search indexing, and admin access you don't control.
cat > .env <<'ENV'
OPENCLAW_IMAGE=openclaw:hetzner
OPENCLAW_GATEWAY_TOKEN=   # Generate below
OPENCLAW_GATEWAY_BIND=lan  # Binds gateway to 0.0.0.0 INSIDE container — safe because Docker restricts host-side to 127.0.0.1 (see compose). On bare metal without Docker, use "loopback" instead!

# Host-side published ports only
OPENCLAW_GATEWAY_PORT=18789
OPENCLAW_BRIDGE_PORT=18790

OPENCLAW_CONFIG_DIR=/home/deploy/.openclaw
OPENCLAW_WORKSPACE_DIR=/home/deploy/.openclaw/workspace

# Optional provider secrets
# CLAUDE_AI_SESSION_KEY=
# CLAUDE_WEB_SESSION_KEY=
# CLAUDE_WEB_COOKIE=
ENV

chmod 600 .env
Generate a gateway token:
openssl rand -hex 32
Paste it into .env as OPENCLAW_GATEWAY_TOKEN.
Secret handling rules


Never commit .env
Keep .env mode 600
Rotate all leaked provider/session secrets immediately
Hand off secrets via encrypted channels only (Signal, iMessage) — never Discord/Slack


5. Compose File (Hardened)

🧒 Child Lens: The compose file is your house's blueprint. It says where the doors are (ports), what rooms connect to what (volumes), and who's allowed in (bindings). A bad blueprint means unlocked doors facing the street.
🔬 First Principles Lens: Docker's default port publishing binds to 0.0.0.0 — every network interface. On a VPS with a public IP, that means your gateway is exposed to the entire internet. Binding to 127.0.0.1 restricts access to localhost only. Combined with token auth and SSH tunneling, this creates defense in depth.
services:
  openclaw-gateway:
    image: ${OPENCLAW_IMAGE:-openclaw:local}
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      HOME: /home/node
      TERM: xterm-256color
      OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN}
      OPENCLAW_GATEWAY_BIND: ${OPENCLAW_GATEWAY_BIND:-lan}
      CLAUDE_AI_SESSION_KEY: ${CLAUDE_AI_SESSION_KEY}
      CLAUDE_WEB_SESSION_KEY: ${CLAUDE_WEB_SESSION_KEY}
      CLAUDE_WEB_COOKIE: ${CLAUDE_WEB_COOKIE}
    volumes:
      - ${OPENCLAW_CONFIG_DIR}:/home/node/.openclaw
      - ${OPENCLAW_WORKSPACE_DIR}:/home/node/.openclaw/workspace
    ports:
      - "127.0.0.1:${OPENCLAW_GATEWAY_PORT:-18789}:18789"
      - "127.0.0.1:${OPENCLAW_BRIDGE_PORT:-18790}:18790"
    init: true
    restart: unless-stopped
    security_opt:
      - no-new-privileges:true
    # Note: Docker applies default seccomp + AppArmor profiles automatically.
    # For stricter hardening, create a custom seccomp profile:
    # seccomp: /path/to/custom-seccomp.json
    # See: https://docs.docker.com/engine/security/seccomp/
    mem_limit: "1g"
    pids_limit: 256
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    command:
      [
        "node",
        "dist/index.js",
        "gateway",
        "--bind",
        "${OPENCLAW_GATEWAY_BIND:-lan}",
        "--port",
        "18789"
      ]
Why this matters:

Container always listens on 18789. Changing OPENCLAW_GATEWAY_PORT only affects the host mapping.
127.0.0.1 binding = not reachable from the internet.
mem_limit / pids_limit prevent runaway processes from killing the VPS.
Never mount the Docker socket (/var/run/docker.sock) into the container — it's equivalent to root on the host.


Repo delta: The upstream docker-compose.yml omits the build: section, does not bind to 127.0.0.1, and has no resource limits. This guide adds all three as security hardening. If deploying with pre-built images and relying solely on firewall rules, you may remove the build section.


6. Hardened Dockerfile

🧒 Child Lens: When you download a game, you want to know it's the real game and not a virus wearing a game costume. Checksums are like checking the game's fingerprint against a trusted list.
🔬 First Principles Lens: Supply chain attacks target the build pipeline. Pinning versions and verifying SHA256 checksums ensures you get exactly the binary you expect — not a compromised one from a hijacked release. releases/latest is a mutable pointer; an attacker who compromises the repo can redirect it.

Hardening additions over repo default: SHELL directive for safer pipe handling, pinned Bun version (upstream uses latest), optional binary installation with SHA256 verification. If you don't need custom skill binaries, the repo Dockerfile works as-is.

FROM node:22-bookworm

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

# Install Bun to shared path (not /root, which is inaccessible to USER node)
ARG BUN_VERSION=1.2.22
# Download Bun install script to file first (inspectable, not a blind curl|bash pipe)
# Pin Bun binary directly (no install script — eliminates supply-chain risk)
RUN BUN_URL="https://github.com/oven-sh/bun/releases/download/bun-v${BUN_VERSION}/bun-linux-x64.zip" \
    && curl -fsSL -o /tmp/bun.zip "$BUN_URL" \
    && unzip -o /tmp/bun.zip -d /tmp/bun-extract \
    && mv /tmp/bun-extract/bun-linux-x64/bun /usr/local/bin/bun \
    && chmod +x /usr/local/bin/bun \
    && rm -rf /tmp/bun.zip /tmp/bun-extract \
    && bun --version
ENV PATH="/usr/local/bin:${PATH}"

RUN corepack enable
WORKDIR /app

# Optional OS packages for skill binaries
ARG OPENCLAW_DOCKER_APT_PACKAGES=""
RUN if [ -n "$OPENCLAW_DOCKER_APT_PACKAGES" ]; then \
      apt-get update && \
      DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends $OPENCLAW_DOCKER_APT_PACKAGES && \
      apt-get clean && \
      rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*; \
    fi

# ┌─────────────────────────────────────────────────────────┐
# │ SHA256 VERIFICATION — OPTIONAL vs REQUIRED              │
# │                                                         │
# │ Core build (no skill binaries): SHA256 args NOT needed. │
# │ Just omit the --build-arg flags and the RUN blocks      │
# │ below become no-ops.                                    │
# │                                                         │
# │ If you ADD skill binaries (gog, goplaces, wacli):       │
# │ SHA256 verification is MANDATORY. Provide all three     │
# │ --build-arg SHA256 values or the build will skip them.  │
# │ Never deploy unverified third-party binaries.           │
# └─────────────────────────────────────────────────────────┘

# Optional: pinned binaries with SHA256 verification
ARG GOG_VERSION=0.6.3
ARG GOG_SHA256
ARG GOPLACES_VERSION=0.4.1
ARG GOPLACES_SHA256
ARG WACLI_VERSION=0.5.2
ARG WACLI_SHA256

# Optional: pinned binaries with SHA256 verification
# If you don't need skill binaries (gog, goplaces, wacli), skip the --build-arg flags
# and this block does nothing. If you DO need them, provide all three SHA256 args.
RUN if [ -n "$GOG_SHA256" ]; then \
      curl -fsSL -o /tmp/gog.tar.gz "https://github.com/steipete/gog/releases/download/v${GOG_VERSION}/gog_Linux_x86_64.tar.gz" && \
      echo "${GOG_SHA256}  /tmp/gog.tar.gz" | sha256sum -c - && \
      tar -xzf /tmp/gog.tar.gz -C /usr/local/bin && \
      chmod +x /usr/local/bin/gog && \
      rm -f /tmp/gog.tar.gz; \
    fi

RUN if [ -n "$GOPLACES_SHA256" ]; then \
      curl -fsSL -o /tmp/goplaces.tar.gz "https://github.com/steipete/goplaces/releases/download/v${GOPLACES_VERSION}/goplaces_Linux_x86_64.tar.gz" && \
      echo "${GOPLACES_SHA256}  /tmp/goplaces.tar.gz" | sha256sum -c - && \
      tar -xzf /tmp/goplaces.tar.gz -C /usr/local/bin && \
      chmod +x /usr/local/bin/goplaces && \
      rm -f /tmp/goplaces.tar.gz; \
    fi

RUN if [ -n "$WACLI_SHA256" ]; then \
      curl -fsSL -o /tmp/wacli.tar.gz "https://github.com/steipete/wacli/releases/download/v${WACLI_VERSION}/wacli_Linux_x86_64.tar.gz" && \
      echo "${WACLI_SHA256}  /tmp/wacli.tar.gz" | sha256sum -c - && \
      tar -xzf /tmp/wacli.tar.gz -C /usr/local/bin && \
      chmod +x /usr/local/bin/wacli && \
      rm -f /tmp/wacli.tar.gz; \
    fi

COPY package.json pnpm-lock.yaml pnpm-workspace.yaml .npmrc ./
COPY ui/package.json ./ui/package.json
COPY patches ./patches
COPY scripts ./scripts
RUN pnpm install --frozen-lockfile

COPY . .
RUN OPENCLAW_A2UI_SKIP_MISSING=1 pnpm build
ENV OPENCLAW_PREFER_PNPM=1
RUN pnpm ui:build

ENV NODE_ENV=production
RUN chown -R node:node /app

USER node

CMD ["node", "openclaw.mjs", "gateway", "--allow-unconfigured"]
Obtain checksums by downloading releases and computing locally:
curl -fsSL -o /tmp/gog.tar.gz \
  "https://github.com/steipete/gog/releases/download/v0.6.3/gog_Linux_x86_64.tar.gz"
sha256sum /tmp/gog.tar.gz
# Use output as GOG_SHA256 build arg

7. Build and Run

🧒 Child Lens: You've drawn the blueprint and bought the materials. Now you actually build the house and turn on the lights.
🔬 First Principles Lens: Every dependency you pull is a trust boundary. --no-cache forces a full rebuild so stale layers can't mask a compromised upstream. Verifying checksums post-build closes the loop: you trusted the hash at build time, now confirm the binary matches at runtime. The entire supply chain — base image, package manager, binaries — is only as strong as its weakest verified link. Checking logs immediately catches startup failures before you assume everything's fine.
# Core build (no skill binaries — omit SHA args entirely):
docker compose build --no-cache
docker compose up -d openclaw-gateway

# With skill binaries (replace with REAL checksums — do NOT use placeholders):
# Get checksums: sha256sum ./path/to/gog ./path/to/goplaces ./path/to/wacli
# docker compose build --no-cache \
#   --build-arg GOG_SHA256=abc123... \
#   --build-arg GOPLACES_SHA256=def456... \
#   --build-arg WACLI_SHA256=789fed...

⚠️ Do not pass placeholder values like YOUR_GOG_SHA256 — non-empty placeholders trigger checksum validation and the build will fail. Either pass real checksums or omit the args entirely.

Verify binaries exist (only if you installed them):
# These only apply if you installed skill binaries — skip if you did a core-only build
docker compose exec openclaw-gateway which gog && echo "✅ gog" || echo "⏭️ gog not installed"
docker compose exec openclaw-gateway which goplaces && echo "✅ goplaces" || echo "⏭️ goplaces not installed"
docker compose exec openclaw-gateway which wacli && echo "✅ wacli" || echo "⏭️ wacli not installed"
Verify gateway is up:
docker compose logs -f openclaw-gateway
Integrity verification — lock it down now

# Use strict installs (fails if packages don't match lockfile)
# pnpm install --frozen-lockfile is already in the Dockerfile

# Audit for known vulnerabilities
docker compose exec openclaw-gateway pnpm audit

The honest gap: OpenClaw updates come via git pull. Git verifies integrity (SHA hashes on every commit) but not identity (commits aren't GPG-signed). Skills from clawhub have no signature verification currently. Lockfiles and pre-update review are the best local defenses until upstream adds signed releases.


8. Access Patterns

🧒 Child Lens: Your house is built and locked. Now you need a way to get in — but you want a secret tunnel, not a door facing the highway.
🔬 First Principles Lens: SSH tunnels encrypt traffic and require key authentication. Tailscale creates a WireGuard mesh with per-device identity. Both keep the gateway off the public internet. Direct internet exposure requires TLS + token + firewall — three things that must all work perfectly, all the time.
A) SSH tunnel (safest)

ssh -N -L 18789:127.0.0.1:18789 deploy@YOUR_VPS_IP
Open http://127.0.0.1:18789/ and enter your OPENCLAW_GATEWAY_TOKEN.
B) Tailnet access (recommended for remote devices)

Keep Docker published on loopback and expose via host-level Tailscale proxy. Do not publish gateway directly to the public internet.
# Install Tailscale (apt repo — NOT curl|sh, consistent with our Docker install approach)
# Detect distro automatically (works for Ubuntu 22.04/24.04 and Debian 12)
if [ ! -f /etc/os-release ]; then echo "ERROR: /etc/os-release not found — install Tailscale manually"; exit 1; fi
DISTRO=$(. /etc/os-release && echo "$ID")
CODENAME=$(. /etc/os-release && echo "$VERSION_CODENAME")
if [ -z "$DISTRO" ] || [ -z "$CODENAME" ]; then echo "ERROR: Could not detect distro/codename from /etc/os-release"; exit 1; fi
curl -fsSL "https://pkgs.tailscale.com/stable/${DISTRO}/${CODENAME}.noarmor.gpg" | tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null
curl -fsSL "https://pkgs.tailscale.com/stable/${DISTRO}/${CODENAME}.tailscale-keyring.list" | tee /etc/apt/sources.list.d/tailscale.list
apt update && apt install -y tailscale
tailscale up

# Verify your Tailscale IP
tailscale ip -4

# Allow Tailscale traffic through UFW
ufw allow in on tailscale0
Access the gateway from any device on your tailnet: http://<tailscale-ip>:18789/

⚠️ Docker + UFW footgun: Do NOT change the Docker host bind from 127.0.0.1 to 0.0.0.0 to expose the port on Tailscale. Docker bypasses UFW rules for published container ports — your gateway would be exposed to the public internet regardless of UFW settings. Instead, use tailscale serve:
tailscale serve --bg http://127.0.0.1:18789
This proxies traffic through your tailnet without changing Docker bindings. The container stays locked to localhost.


Security note: Tailscale uses WireGuard for encryption between nodes. If a device on your tailnet is compromised, the attacker can see traffic between that node and others. Treat your tailnet as a trusted network — but not a zero-trust one. For defense-in-depth, the gateway still requires token auth regardless of network path.

Tailscale key management:

Prefer tagged reusable keys with explicit expiration
Track key expiry dates and rotate before expiration
After rotation, verify node connectivity and ACL enforcement

C) TLS reverse proxy (if you must expose publicly)

Use Caddy for automatic HTTPS with zero config. Install and create a Caddyfile:
apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list
apt update && apt install caddy
# /etc/caddy/Caddyfile
openclaw.example.com {
    reverse_proxy 127.0.0.1:18789
}
ufw allow 443/tcp
systemctl enable --now caddy
Caddy auto-provisions and renews Let's Encrypt TLS certificates. Keep the upstream on 127.0.0.1 — the proxy handles all public-facing traffic.

9. Token Rotation

🧒 Child Lens: If someone copies your house key and you never change the locks, they can come in forever and you'd never know. Change the locks regularly, and the copied key stops working.
🔬 First Principles Lens: API tokens are bearer credentials — anyone who has the token IS you. Tokens don't expire by default. The exposure window equals the token's lifetime. Rotation bounds that lifetime. A stolen key that stops working in 90 days is categorically different from one that works forever.
Gateway token rotation

# 1. Generate new token
NEW_TOKEN="$(openssl rand -hex 32)"

# 2. Update .env (OPENCLAW_GATEWAY_TOKEN=$NEW_TOKEN), keep mode 600

# 3. Restart gateway
docker compose up -d --force-recreate openclaw-gateway

# 4. Validate health
curl -sf -H "Authorization: Bearer $NEW_TOKEN" http://127.0.0.1:18789/ > /dev/null

# 5. Re-authenticate all clients with new token

# 6. Invalidate old token everywhere (shell history, password managers, notes)
Provider key rotation runbook

For each provider, document:

Where to rotate — the dashboard URL
Where the key lives — which config files (there may be multiple!)
How to hand off — never paste keys in chat. Use encrypted channels.
How to verify — what breaks if you got it wrong

Example rotation matrix:


Provider
Config Locations
Rotation Method


Anthropic
auth-profiles.json (2 entries)
console.anthropic.com → API Keys


OpenAI
env, auth.json
platform.openai.com → API Keys


ElevenLabs
openclaw.json (2 places)
elevenlabs.io → Profile


Twilio
openclaw.json (3 places)
console.twilio.com (24h grace period!)


xAI
openclaw.json
console.x.ai


Google/Gemini
openclaw.json + 2 skill configs
aistudio.google.com


Brave Search
openclaw.json (2 places)
api.search.brave.com


Backblaze B2
~/.config/restic/b2.env
backblaze.com → App Keys


Cadence: Quarterly (every 90 days). Set a recurring reminder. If you don't schedule it, it won't happen.

10. Backups and Disaster Recovery

🧒 Child Lens: An alarm clock that's set but not plugged in doesn't wake you up. It looks right — the time is set, the alarm is on — but it's not actually working. You only find out when you oversleep.
🔬 First Principles Lens: A backup system has three parts: the scheduler (triggers it), the tool (creates it), and the verification (proves it worked). Most failures are silent. A backup you haven't verified is not a backup.
What to back up


/home/deploy/.openclaw (config, auth, state)
/home/deploy/.openclaw/workspace (workspace data)

Minimal nightly backup

#!/bin/bash
set -euo pipefail
export PATH="/usr/local/bin:/usr/bin:/bin"

BACKUP_DIR="/var/backups/openclaw"
mkdir -p "$BACKUP_DIR"

TS="$(date +%F-%H%M%S)"

if ! tar -C / -czf "${BACKUP_DIR}/openclaw-${TS}.tar.gz" home/deploy/.openclaw 2>&1; then
    echo "🚨 Backup tar creation failed at $(date)" >&2
    exit 1
fi

# Verify the tarball is readable
if ! tar -tzf "${BACKUP_DIR}/openclaw-${TS}.tar.gz" > /dev/null 2>&1; then
    echo "🚨 Backup tarball corrupt at $(date)" >&2
    exit 1
fi

# Retention: keep 14 days
find "$BACKUP_DIR" -type f -mtime +14 -delete
systemd timer

# /etc/systemd/system/openclaw-backup.service
[Unit]
Description=OpenClaw Backup

[Service]
Type=oneshot
ExecStart=/usr/local/bin/openclaw-backup.sh

# /etc/systemd/system/openclaw-backup.timer
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true

[Install]
WantedBy=timers.target
systemctl enable --now openclaw-backup.timer
For restic/B2 users — use absolute paths!


What we actually found (macOS deployment): A backup scheduled via macOS launchd to run daily at 3am. The plist was loaded, the script existed, the configuration looked correct. But: exit code 127 — restic: command not found. launchd doesn't inherit your shell's PATH. The backup had been silently failing every night. Only one snapshot existed — from a manual run days earlier. (On Linux, systemd has a similar gotcha — always use absolute paths in timer units.)

Always use absolute paths in scheduled scripts:
#!/bin/bash
# Linux:
export PATH="/usr/local/bin:/usr/bin:/bin"
RESTIC=/usr/local/bin/restic
# macOS: uncomment below instead
# export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
# RESTIC=/opt/homebrew/bin/restic

$RESTIC backup ~/.openclaw/workspace \
  --exclude '.git' \
  --exclude 'node_modules' \
  --tag openclaw-workspace

$RESTIC forget \
  --keep-daily 7 \
  --keep-weekly 4 \
  --keep-monthly 6 \
  --prune
Verify periodically

restic snapshots          # Are new ones appearing?
restic check              # Repository integrity
Recovery test (required)


Restore to a fresh VPS
Start with same .env
Confirm channels/auth/session state are intact


The irony: We had just added backup failure alerting 10 minutes before discovering the backup was already broken. If we'd had monitoring earlier, we'd have caught it days ago. Order matters.


11. Health Monitoring

🧒 Child Lens: Your house has smoke detectors. They don't prevent fires — they make sure you KNOW there's a fire so you can act. Without them, a small problem becomes a big problem while you're asleep.
🔬 First Principles Lens: Every automated system can fail silently. The cost of a failure is proportional to how long it goes undetected. Monitoring doesn't prevent failures — it bounds the detection time.
What to monitor


Check
Threshold
Why


Disk space
>85% warning, >95% critical
Full disk = no logs, no backups, cascading failures


Gateway process
Not running
If it's down, everything's down


Free memory
<50MB
OOM kills are silent and random


Last backup age
>36 hours
Catches silent backup failures


Container restarts
Repeated
Crash loop indicates config or resource problem


Auth cooldowns
Rate-limited profiles
You're burning through quota


Health monitor script

#!/bin/bash
STATUS="healthy"
ALERTS=""

# Disk
DISK_PCT=$(df -h / | awk 'NR==2 {gsub(/%/,""); print $5}')
if [ "$DISK_PCT" -gt 95 ]; then
    STATUS="critical"; ALERTS+="🚨 Disk ${DISK_PCT}% full\n"
elif [ "$DISK_PCT" -gt 85 ]; then
    STATUS="warning"; ALERTS+="⚠️ Disk ${DISK_PCT}% full\n"
fi

# Gateway container
if ! docker compose ps --status running openclaw-gateway | grep -q "openclaw-gateway"; then
    STATUS="critical"; ALERTS+="🚨 Gateway container not running!\n"
fi

# Memory
FREE_MB=$(free -m | awk '/Mem:/ {print $7}')
if [ "$FREE_MB" -lt 50 ]; then
    STATUS="warning"; ALERTS+="⚠️ Low memory: ${FREE_MB}MB available\n"
fi

# Backup age
# Use stat for portability (GNU find -printf not available on all systems)
LAST_BACKUP=$(find /var/backups/openclaw -name '*.tar.gz' 2>/dev/null | xargs -r stat --format='%Y' 2>/dev/null | sort -rn | head -1)
if [ -n "$LAST_BACKUP" ]; then
    NOW=$(date +%s)
    HOURS_AGO=$(( (NOW - ${LAST_BACKUP%.*}) / 3600 ))
    if [ "$HOURS_AGO" -gt 36 ]; then
        STATUS="warning"; ALERTS+="⚠️ Backup stale: ${HOURS_AGO}h old\n"
    fi
else
    STATUS="warning"; ALERTS+="⚠️ No backups found!\n"
fi

# Gateway health probe
if ! curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/ > /dev/null 2>&1; then
    STATUS="warning"; ALERTS+="⚠️ Health probe failed\n"
fi

echo "status:$STATUS"
if [ -n "$ALERTS" ]; then
    echo -e "$ALERTS"

    # Discord webhook alerting
    if [ -n "$DISCORD_WEBHOOK_URL" ]; then
        PAYLOAD=$(echo -e "$ALERTS" | jq -Rs '{content: ("🚨 **OpenClaw Health Alert**\n" + .)}')
        curl -sf -X POST -H "Content-Type: application/json" -d "$PAYLOAD" "$DISCORD_WEBHOOK_URL"
    fi

    # ntfy.sh alternative (lightweight, no setup)
    if [ -n "$NTFY_TOPIC" ]; then
        echo -e "$ALERTS" | curl -sf -d @- "https://ntfy.sh/${NTFY_TOPIC}"
    fi
fi
Cron it

The health script needs OPENCLAW_GATEWAY_TOKEN to probe the gateway. Source it from your env file:
# /etc/cron.d/openclaw-health
SHELL=/bin/bash
OPENCLAW_GATEWAY_TOKEN="" # paste token here (not sourced from .env — Docker .env files aren't guaranteed shell-safe)
LOG_DIR=/home/deploy/openclaw/logs
*/30 * * * * deploy mkdir -p $LOG_DIR && cd /home/deploy/openclaw && /usr/local/bin/openclaw-health.sh 2>&1 | grep -v "^status:healthy$" >> $LOG_DIR/openclaw-alerts.log
Log hygiene


Gateway logs may contain request metadata — review before shipping externally
Implement log rotation ✅ Handled by json-file logging driver in compose (max-size: 10m, max-file: 3)
If using log aggregation, ensure transport is encrypted and destination is access-controlled


The principle: Good monitoring is silent when everything's fine and loud when something's wrong. If it's noisy, you'll ignore it. If it's silent, you'll forget it exists.


12. Update Strategy

🧒 Child Lens: Before you install the update, check what's in the box. Don't just click "update all" and hope.
🔬 First Principles Lens: Every update is a trust decision. git pull verifies integrity (SHA hashes) but not identity (no GPG signatures). Reviewing changes before applying them is the best local defense.
Pre-update safety script

#!/bin/bash
echo "=== Current Version ==="
git log --oneline -1

echo "=== Upstream Changes ==="
git fetch origin
git log --oneline HEAD..origin/main

echo "=== Package Audit ==="
docker compose exec -T openclaw-gateway pnpm audit 2>/dev/null || echo "No vulnerabilities"
Update procedure

git pull --ff-only
docker compose build
docker compose up -d openclaw-gateway
curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/
The rule: review before you update. Never blind-update.
Rollback procedure

Tag known-good states before updating so you can revert cleanly:
# Before updating, tag the current working state
git tag "good-$(date +%F)"

# Update
git pull --ff-only
docker compose build
docker compose up -d openclaw-gateway

# If something breaks — roll back
docker compose down
git checkout good-2026-02-07   # your last known-good tag
docker compose build
docker compose up -d openclaw-gateway
Keep the last 3 known-good images to avoid re-building on rollback:
# List OpenClaw images by date
docker images openclaw --format "{{.ID}} {{.CreatedAt}}" | sort -k2 -r

# Remove all but the 3 most recent (sorted by creation time, newest first)
docker images openclaw --format "{{.CreatedAt}}\t{{.ID}}" | sort -r | tail -n +4 | awk '{print $NF}' | xargs -r docker rmi

# Clean up dangling layers
docker image prune -f

Repo delta: The upstream repo does not provide GPG-signed releases or reproducible builds. Skills from clawhub have no signature verification. Until upstream addresses this, lockfiles + pre-update review are your best defense.


13. Secrets at Rest

🧒 Child Lens: Cash in a locked drawer is protected as long as nobody picks the lock. But if someone breaks in, the drawer lock is the only thing left. A safe inside the drawer adds a layer — but only if the combination isn't taped underneath.
🔬 First Principles Lens: File permissions (chmod 600) protect against other local users. They don't protect against a compromised process running as your user, physical theft of an unencrypted disk, or unencrypted backups.
Decision framework


Scenario
Recommendation


Hetzner VPS
LUKS at provisioning, file permissions tight, rotate keys quarterly


Always-on home server
Disk encryption OFF (accept risk for auto-boot), file permissions tight, rotate quarterly


Office/colo server
Full disk encryption ON, accept manual reboots


Belt-and-suspenders
Full disk + age/sops + rotation


What to encrypt


LUKS (full disk): Protects against physical theft. Hetzner offers it at provisioning time. Use it.
File-level (age/sops): Diminishing returns — the decryption key must be accessible at runtime, creating circular dependency.


14. Image Provenance

🧒 Child Lens: If someone hands you a mystery box and says "trust me, it's the right thing" — you'd want to at least check the label matches what you ordered.
🔬 First Principles Lens: If building locally, the source is the trust anchor. For pre-built images, verify the digest and scan for vulnerabilities.
# Pin images by digest in production
# image: openclaw@sha256:abc123...

# Scan for vulnerabilities
docker scout cves openclaw:hetzner
# or
trivy image openclaw:hetzner

15. Persistence Source of Truth


Container Path
Host Path
Source


/home/node/.openclaw
${OPENCLAW_CONFIG_DIR}
Volume mount


/home/node/.openclaw/workspace
${OPENCLAW_WORKSPACE_DIR}
Volume mount


/usr/local/bin/*
N/A
Image build


Container can be recreated safely if and only if host volumes are intact.
Data integrity considerations:

Volume corruption: Docker volumes use the host filesystem — if the host disk corrupts, volumes corrupt too. This is why offsite backups (Section 10) are non-negotiable.
Upgrade migrations: Before any OpenClaw version upgrade, snapshot the volumes: tar -czf openclaw-pre-upgrade-$(date +%s).tar.gz /home/deploy/.openclaw/. If the new version changes data formats, you have a clean rollback point.
Ownership drift: If you rebuild the container with a different UID, volume permissions break. Always verify with docker compose exec openclaw-gateway id after rebuilds.
Never store state inside the container that isn't on a mounted volume. docker compose down && docker compose up -d must be a no-op for your data.


16. Accepted Risks — Document Everything

🧒 Child Lens: A to-do list isn't just for things you're going to do. It's also for things you've decided NOT to do — and why.
🔬 First Principles Lens: Security is a spectrum of trade-offs. Documenting accepted risks is not negligence — it's engineering. The dangerous position is having unexamined risks, not having documented ones.
Template

### [Risk Name] — [Current State]

**Risk:** What could go wrong.
**Why accepted:** Why this trade-off makes sense.
**Mitigations:** What you're doing instead.
**Revisit when:** Conditions that would change the decision.
Common accepted risks for OpenClaw deployments


No GPG on updates — Can't verify commit authorship. Upstream gap. Bounded by pre-update review script.
No age/sops file encryption — Circular dependency at runtime. Bounded by file permissions + token rotation.
Full disk encryption off (home server variant) — Physical theft exposes disk. Bounded by token rotation + location.


17. Common Pitfalls


Changing container gateway port away from 18789 in compose command
Using releases/latest binary URLs in production
Installing binaries manually inside a running container (lost on recreate)
Exposing gateway publicly without firewall + TLS + token
Leaving stale tokens unrotated after team/user changes
Scheduling backups without verifying they actually run
Using npm install instead of npm ci / pnpm install --frozen-lockfile
Piping secrets through Discord or Slack


Security Scorecard

Use this after deployment. Every box should be checked or have a documented reason why not.
Baseline (do these or don't deploy)


 Firewall enabled, default deny incoming
 SSH key-only auth (password auth disabled)
 .env exists with chmod 600
 OPENCLAW_GATEWAY_TOKEN set (not empty)
 Docker ports bound to 127.0.0.1
 Persistent dirs owned by UID 1000, mode 700
 security_opt: no-new-privileges set in compose
 Docker socket NOT mounted into container

Supply Chain


 Docker installed from signed apt repo (not curl | sh)
 Skill binaries pinned to version + SHA256 checksum
 Lockfiles present and used (--frozen-lockfile)
 No releases/latest URLs in Dockerfile
 .gitignore covers auth-profiles.json, *.env, secrets

Operations


 Backup automation running and verified (check for real output!)
 Backup retention policy configured
 Recovery tested on a fresh VPS at least once
 Health monitoring active (disk, process, memory, backup age)
 Log rotation configured
 Token rotation runbook documented
 Quarterly rotation reminder set

Network


 Gateway not exposed to public internet (SSH tunnel or Tailscale)
 If TLS-terminated: certs auto-renew, HTTPS enforced, proxy ↔ gateway on loopback
 Tailscale keys tracked with expiry dates (if used)

Documentation


 Accepted risks documented with rationale
 Provider rotation matrix filled in
 Pre-update review script in place
 This checklist reviewed on every major update


Verify Everything Works

Run this checklist after initial deployment or any major update. You can run it manually or use the automated script below.
# 1. Health endpoint responds
curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/
echo "✅ Gateway reachable"

# 2. Container runs as non-root
docker compose exec openclaw-gateway id
# Expect: uid=1000(node) gid=1000(node) — NOT root

# 3. Firewall rules correct
ufw status verbose
# Expect: default deny incoming, SSH allowed, no 18789 open to public

# 4. Backup cycle works
/usr/local/bin/openclaw-backup.sh && echo "✅ Backup succeeded"
ls -lh /var/backups/openclaw/ | head -3

# 5. Channel connectivity (Discord, etc.)
# Send a test message through your configured channel and confirm delivery

# 6. Volumes have correct ownership
ls -la /home/deploy/.openclaw/
# Expect: owned by 1000:1000, mode 700
Automated Smoke Test

Save as ops/smoke-test.sh and run after every deployment or update:
#!/usr/bin/env bash
set -euo pipefail
source .env 2>/dev/null || true

PASS=0; FAIL=0
check() {
  if eval "$2" >/dev/null 2>&1; then
    echo "✅ $1"; ((PASS++))
  else
    echo "❌ $1"; ((FAIL++))
  fi
}

check "Gateway health" \
  'curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/'

check "Non-root container" \
  '[ "$(docker compose exec -T openclaw-gateway id -u)" = "1000" ]'

check "UFW active" \
  'ufw status | grep -q "Status: active"'

check "18789 bound to localhost only" \
  'docker inspect $(docker compose ps -q openclaw-gateway) 2>/dev/null | grep -q "127.0.0.1:.*18789"'

check "Volume ownership" \
  '[ "$(stat -c %u /home/deploy/.openclaw 2>/dev/null || stat -f %u /home/deploy/.openclaw)" = "1000" ]'

check "Log rotation configured" \
  'docker inspect $(docker compose ps -q openclaw-gateway) 2>/dev/null | grep -q max-size'

echo ""
echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" -eq 0 ] || exit 1
chmod +x ops/smoke-test.sh
./ops/smoke-test.sh

Troubleshooting

🧒 Child Lens: When the lights go out, you check the breaker box — not rewire the whole house. Start with the most obvious cause.
🔬 First Principles Lens: Most failures are configuration errors, not bugs. Logs tell you what happened; network tools tell you what's reachable. Check both before changing anything.


Symptom
Diagnosis
Fix


Container won't start
docker compose logs -f openclaw-gateway — look for crash reason
Fix config/env, then docker compose up -d


Gateway unreachable
ss -tlnp | grep 18789 — is it listening? Check UFW: ufw status
Verify bind address in .env, check docker compose ps


Auth failures (401)
Token mismatch between client and server
Compare $OPENCLAW_GATEWAY_TOKEN in .env vs what client sends. Restart after changes: docker compose up -d --force-recreate


Backup failures
Check exit code: echo $? after manual run. Common: disk full, wrong permissions
df -h for space, ls -la /var/backups/openclaw/ for perms


Discord/channel disconnects
Gateway lost websocket connection
docker compose restart openclaw-gateway — check logs for rate limits


OOM killed
dmesg | grep -i oom
Increase mem_limit in compose or reduce workload


Philosophy

Security isn't a destination. It's a practice — like brushing your teeth. You do it regularly, you do it honestly, and you document what you skip and why.
The three most dangerous words in security are "it should work." Check. Verify. Test the backup by restoring from it. Run the health monitor and see if it actually alerts. Try to read your own secrets from a different user account.
Trust, but verify. Then verify again.

Based on Brad Barbin's Hetzner deployment gist and a real security audit conducted February 2026. All scripts referenced here live in the ops/ directory of the OpenClaw workspace.
Provider	Config Locations	Rotation Method
Anthropic	auth-profiles.json (2 entries)	console.anthropic.com → API Keys
OpenAI	env, auth.json	platform.openai.com → API Keys
ElevenLabs	openclaw.json (2 places)	elevenlabs.io → Profile
Twilio	openclaw.json (3 places)	console.twilio.com (24h grace period!)
xAI	openclaw.json	console.x.ai
Google/Gemini	openclaw.json + 2 skill configs	aistudio.google.com
Brave Search	openclaw.json (2 places)	api.search.brave.com
Backblaze B2	~/.config/restic/b2.env	backblaze.com → App Keys
Check	Threshold	Why
Disk space	>85% warning, >95% critical	Full disk = no logs, no backups, cascading failures
Gateway process	Not running	If it's down, everything's down
Free memory	<50MB	OOM kills are silent and random
Last backup age	>36 hours	Catches silent backup failures
Container restarts	Repeated	Crash loop indicates config or resource problem
Auth cooldowns	Rate-limited profiles	You're burning through quota
Scenario	Recommendation
Hetzner VPS	LUKS at provisioning, file permissions tight, rotate keys quarterly
Always-on home server	Disk encryption OFF (accept risk for auto-boot), file permissions tight, rotate quarterly
Office/colo server	Full disk encryption ON, accept manual reboots
Belt-and-suspenders	Full disk + age/sops + rotation
Container Path	Host Path	Source
`/home/node/.openclaw`	`${OPENCLAW_CONFIG_DIR}`	Volume mount
`/home/node/.openclaw/workspace`	`${OPENCLAW_WORKSPACE_DIR}`	Volume mount
`/usr/local/bin/*`	N/A	Image build
Symptom	Diagnosis	Fix
Container won't start	`docker compose logs -f openclaw-gateway` — look for crash reason	Fix config/env, then `docker compose up -d`
Gateway unreachable	`ss -tlnp \| grep 18789` — is it listening? Check UFW: `ufw status`	Verify bind address in `.env`, check `docker compose ps`
Auth failures (401)	Token mismatch between client and server	Compare `$OPENCLAW_GATEWAY_TOKEN` in `.env` vs what client sends. Restart after changes: `docker compose up -d --force-recreate`
Backup failures	Check exit code: `echo $?` after manual run. Common: disk full, wrong permissions	`df -h` for space, `ls -la /var/backups/openclaw/` for perms
Discord/channel disconnects	Gateway lost websocket connection	`docker compose restart openclaw-gateway` — check logs for rate limits
OOM killed	`dmesg \| grep -i oom`	Increase `mem_limit` in compose or reduce workload