Guide version: 2.0 β February 7, 2026 Last reviewed: 2026-02-07 | Lines: ~1125 | Grade: Multi-model audited (Opus 4.6, Codex 5.3, Grok 3)
Based on Brad Barbin's original Hetzner deployment gist. Security hardening from a real production audit by The Dude π³.
Platform: Written for Hetzner VPS (Ubuntu 24.04/22.04 or Debian 12), but the security principles and Docker-based deployment apply to any Linux host. macOS-specific notes are called out where relevant.
Every major section has three parts:
- π§ Child Lens β A simple analogy. If you can't explain it to a kid, you don't understand it.
- π¬ First Principles Lens β What's actually at risk. No security theater.
- Commands β Copy-paste ready.
This isn't theoretical. Every security item came from auditing a real OpenClaw deployment β including a backup system that had been silently failing for days.
Laptop ββSSH tunnelβββΆ Hetzner VPS (127.0.0.1:18789) βββΆ Docker container (:18789)
β
Tailnet devices βββββββββββββ (host-level Tailscale proxy)
Key rules:
- Gateway listens on container port
18789β this never changes - Docker publishes to host port
${OPENCLAW_GATEWAY_PORT}β this can vary - Host binding stays loopback-only (
127.0.0.1) unless you explicitly need remote exposure - Access via SSH tunnel or Tailscale β never expose directly to the internet
This guide cross-checks the live repo:
Dockerfileuses Bun and runs asUSER node- Default image
CMDisnode openclaw.mjs gateway --allow-unconfigured docker-compose.ymlkeeps container gateway port fixed at18789
- Hetzner VPS (Ubuntu 24.04/22.04 or Debian 12)
- Root SSH access
- Domain/TLS optional (recommended if exposing beyond loopback/tailnet)
- OpenClaw repo available on the host
π§ Child Lens: Before you put anything in your new house, you lock the doors, install smoke detectors, and check the windows. Don't move in first and secure later.
π¬ First Principles Lens: A fresh VPS has SSH open to the internet and no firewall. Every minute it's exposed unpatched is a minute attackers can probe it. Baseline hardening reduces the attack surface before you install anything worth stealing.
This is the only time you SSH as root. After creating the
deployuser below, all subsequent commands usedeploy@YOUR_VPS_IPwithsudo.
ssh root@YOUR_VPS_IP
apt-get update
apt-get -y upgrade
apt-get install -y --no-install-recommends \
ca-certificates curl gnupg ufw fail2ban unattended-upgrades jqdpkg-reconfigure -plow unattended-upgradesufw default deny incoming
ufw default allow outgoing
ufw allow OpenSSH
ufw --force enable
ufw status verbose
β οΈ The SSH rule matters. Enabling the firewall without allowing SSH first means you just locked yourself out. There is no "undo" button from outside. We've seen deployment guides skip this step.
First, create a non-root deploy user (all remaining commands use this user via sudo):
adduser deploy
usermod -aG sudo deploy
# Copy your SSH key to the new user
mkdir -p /home/deploy/.ssh
cp ~/.ssh/authorized_keys /home/deploy/.ssh/
chown -R deploy:deploy /home/deploy/.ssh
chmod 700 /home/deploy/.ssh && chmod 600 /home/deploy/.ssh/authorized_keysThen disable password auth and root login in /etc/ssh/sshd_config:
PasswordAuthentication no
PermitRootLogin no
AllowUsers deploy
Then systemctl restart sshd.
Test before disconnecting! Open a second terminal and verify
ssh deploy@YOUR_VPS_IPworks before closing your root session. If it fails, you still have the root session to fix it.
Key types: Use Ed25519 keys (
ssh-keygen -t ed25519). RSA works but Ed25519 is shorter, faster, and has no known weaknesses. Changing the SSH port (e.g., 2222) reduces log noise from bots but is not a security measure β don't rely on it.
Time: 5 minutes. Impact: Massive.
fail2ban was installed above but needs activation. Enable the SSH jail:
cat > /etc/fail2ban/jail.local <<'EOF'
[sshd]
enabled = true
port = ssh
maxretry = 5
bantime = 3600
findtime = 600
EOF
systemctl enable --now fail2ban
fail2ban-client status sshd # verify it's runningFor detecting unauthorized file changes on the host:
apt install -y aide
aideinit # generates initial database (takes a few minutes)
# Run daily check via cron:
echo '0 3 * * * root /usr/bin/aide --check' > /etc/cron.d/aide-check㪠First Principles Lens: fail2ban rate-limits brute-force attempts; AIDE detects if someone modifies system files after gaining access. Together they cover both the "getting in" and "already in" attack phases.
π§ Child Lens: When you install an app, you want to make sure it came from the real store, not a fake one. Using Docker's signed repo is like checking the store's ID badge.
π¬ First Principles Lens: curl | sh downloads and executes in one step β HTTPS provides transport integrity, but you never inspect what you're running. A compromised server or CDN serves you malware and you execute it blindly. GPG-signed apt repos let the package manager verify the package hasn't been tampered with before installing β you can also inspect what you're getting.
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
. /etc/os-release
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu ${VERSION_CODENAME} stable" \
> /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
docker --version
docker compose versionFor Debian, replace ubuntu in the repo URL with debian.
π§ Child Lens: You're building a house (the container) on a foundation (the host). The stuff you care about β photos, documents β lives in the foundation, not the house. If the house burns down, you rebuild it. The foundation stays.
π¬ First Principles Lens: Containers are ephemeral. Any state not on a mounted volume is lost on recreate. File ownership must match the container's runtime user (node, UID 1000). Permission 700 means only the owner can read the directory β other users on the host can't peek at your config or secrets.
git clone https://github.com/openclaw/openclaw.git
cd openclaw
mkdir -p /home/deploy/.openclaw /home/deploy/.openclaw/workspace
chown -R 1000:1000 /home/deploy/.openclaw /home/deploy/.openclaw/workspace
chmod 700 /home/deploy/.openclaw /home/deploy/.openclaw/workspaceThe
1000:1000ownership matchesUSER nodein the container image. Verify after building:docker compose run --rm openclaw-gateway idβ expectuid=1000(node) gid=1000(node).
Before you start committing anything to this repo, set up your .gitignore:
cat >> .gitignore <<'EOF'
# Secrets β never commit these
auth-profiles.json
*.env
.env
discord-history/
EOFAudit for any secrets already in history:
git log --all --diff-filter=A --name-only --pretty=format: | sort -u | grep -iE 'token|secret|key|password|auth|\.env'If you find anything, scrub it:
# Install the tool
pip install git-filter-repo # or: apt install git-filter-repo
# Remove a file from all history
git filter-repo --path auth-profiles.json --invert-paths --force
β οΈ git filter-reporewrites ALL commit hashes. Existing clones and forks will diverge. Only use on repos you fully control, and force-push after.
What we actually found: A
discord-history/directory with message dumps committed to a workspace repo. Scrubbed it from all history. The content wasn't catastrophic, but the habit is β next time it could be API keys.
π§ Child Lens: Your .env file is like a keychain with all your house keys, car keys, and safe combination on it. You don't leave it on the front porch β you keep it in your pocket, and only you can reach it.
π¬ First Principles Lens: The .env file contains bearer credentials. Anyone who reads it IS you from the provider's perspective. chmod 600 means only the file owner can read it. Never commit it. Never paste its contents in Discord or Slack β those are cloud services with message history, search indexing, and admin access you don't control.
cat > .env <<'ENV'
OPENCLAW_IMAGE=openclaw:hetzner
OPENCLAW_GATEWAY_TOKEN= # Generate below
OPENCLAW_GATEWAY_BIND=lan # Binds gateway to 0.0.0.0 INSIDE container β safe because Docker restricts host-side to 127.0.0.1 (see compose). On bare metal without Docker, use "loopback" instead!
# Host-side published ports only
OPENCLAW_GATEWAY_PORT=18789
OPENCLAW_BRIDGE_PORT=18790
OPENCLAW_CONFIG_DIR=/home/deploy/.openclaw
OPENCLAW_WORKSPACE_DIR=/home/deploy/.openclaw/workspace
# Optional provider secrets
# CLAUDE_AI_SESSION_KEY=
# CLAUDE_WEB_SESSION_KEY=
# CLAUDE_WEB_COOKIE=
ENV
chmod 600 .envGenerate a gateway token:
openssl rand -hex 32Paste it into .env as OPENCLAW_GATEWAY_TOKEN.
- Never commit
.env - Keep
.envmode600 - Rotate all leaked provider/session secrets immediately
- Hand off secrets via encrypted channels only (Signal, iMessage) β never Discord/Slack
π§ Child Lens: The compose file is your house's blueprint. It says where the doors are (ports), what rooms connect to what (volumes), and who's allowed in (bindings). A bad blueprint means unlocked doors facing the street.
π¬ First Principles Lens: Docker's default port publishing binds to 0.0.0.0 β every network interface. On a VPS with a public IP, that means your gateway is exposed to the entire internet. Binding to 127.0.0.1 restricts access to localhost only. Combined with token auth and SSH tunneling, this creates defense in depth.
services:
openclaw-gateway:
image: ${OPENCLAW_IMAGE:-openclaw:local}
build:
context: .
dockerfile: Dockerfile
environment:
HOME: /home/node
TERM: xterm-256color
OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN}
OPENCLAW_GATEWAY_BIND: ${OPENCLAW_GATEWAY_BIND:-lan}
CLAUDE_AI_SESSION_KEY: ${CLAUDE_AI_SESSION_KEY}
CLAUDE_WEB_SESSION_KEY: ${CLAUDE_WEB_SESSION_KEY}
CLAUDE_WEB_COOKIE: ${CLAUDE_WEB_COOKIE}
volumes:
- ${OPENCLAW_CONFIG_DIR}:/home/node/.openclaw
- ${OPENCLAW_WORKSPACE_DIR}:/home/node/.openclaw/workspace
ports:
- "127.0.0.1:${OPENCLAW_GATEWAY_PORT:-18789}:18789"
- "127.0.0.1:${OPENCLAW_BRIDGE_PORT:-18790}:18790"
init: true
restart: unless-stopped
security_opt:
- no-new-privileges:true
# Note: Docker applies default seccomp + AppArmor profiles automatically.
# For stricter hardening, create a custom seccomp profile:
# seccomp: /path/to/custom-seccomp.json
# See: https://docs.docker.com/engine/security/seccomp/
mem_limit: "1g"
pids_limit: 256
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
ulimits:
nofile:
soft: 65536
hard: 65536
command:
[
"node",
"dist/index.js",
"gateway",
"--bind",
"${OPENCLAW_GATEWAY_BIND:-lan}",
"--port",
"18789"
]Why this matters:
- Container always listens on
18789. ChangingOPENCLAW_GATEWAY_PORTonly affects the host mapping. 127.0.0.1binding = not reachable from the internet.mem_limit/pids_limitprevent runaway processes from killing the VPS.- Never mount the Docker socket (
/var/run/docker.sock) into the container β it's equivalent to root on the host.
Repo delta: The upstream
docker-compose.ymlomits thebuild:section, does not bind to127.0.0.1, and has no resource limits. This guide adds all three as security hardening. If deploying with pre-built images and relying solely on firewall rules, you may remove the build section.
π§ Child Lens: When you download a game, you want to know it's the real game and not a virus wearing a game costume. Checksums are like checking the game's fingerprint against a trusted list.
π¬ First Principles Lens: Supply chain attacks target the build pipeline. Pinning versions and verifying SHA256 checksums ensures you get exactly the binary you expect β not a compromised one from a hijacked release. releases/latest is a mutable pointer; an attacker who compromises the repo can redirect it.
Hardening additions over repo default:
SHELLdirective for safer pipe handling, pinned Bun version (upstream uses latest), optional binary installation with SHA256 verification. If you don't need custom skill binaries, the repo Dockerfile works as-is.
FROM node:22-bookworm
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
# Install Bun to shared path (not /root, which is inaccessible to USER node)
ARG BUN_VERSION=1.2.22
# Download Bun install script to file first (inspectable, not a blind curl|bash pipe)
# Pin Bun binary directly (no install script β eliminates supply-chain risk)
RUN BUN_URL="https://github.com/oven-sh/bun/releases/download/bun-v${BUN_VERSION}/bun-linux-x64.zip" \
&& curl -fsSL -o /tmp/bun.zip "$BUN_URL" \
&& unzip -o /tmp/bun.zip -d /tmp/bun-extract \
&& mv /tmp/bun-extract/bun-linux-x64/bun /usr/local/bin/bun \
&& chmod +x /usr/local/bin/bun \
&& rm -rf /tmp/bun.zip /tmp/bun-extract \
&& bun --version
ENV PATH="/usr/local/bin:${PATH}"
RUN corepack enable
WORKDIR /app
# Optional OS packages for skill binaries
ARG OPENCLAW_DOCKER_APT_PACKAGES=""
RUN if [ -n "$OPENCLAW_DOCKER_APT_PACKAGES" ]; then \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends $OPENCLAW_DOCKER_APT_PACKAGES && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*; \
fi
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# β SHA256 VERIFICATION β OPTIONAL vs REQUIRED β
# β β
# β Core build (no skill binaries): SHA256 args NOT needed. β
# β Just omit the --build-arg flags and the RUN blocks β
# β below become no-ops. β
# β β
# β If you ADD skill binaries (gog, goplaces, wacli): β
# β SHA256 verification is MANDATORY. Provide all three β
# β --build-arg SHA256 values or the build will skip them. β
# β Never deploy unverified third-party binaries. β
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Optional: pinned binaries with SHA256 verification
ARG GOG_VERSION=0.6.3
ARG GOG_SHA256
ARG GOPLACES_VERSION=0.4.1
ARG GOPLACES_SHA256
ARG WACLI_VERSION=0.5.2
ARG WACLI_SHA256
# Optional: pinned binaries with SHA256 verification
# If you don't need skill binaries (gog, goplaces, wacli), skip the --build-arg flags
# and this block does nothing. If you DO need them, provide all three SHA256 args.
RUN if [ -n "$GOG_SHA256" ]; then \
curl -fsSL -o /tmp/gog.tar.gz "https://github.com/steipete/gog/releases/download/v${GOG_VERSION}/gog_Linux_x86_64.tar.gz" && \
echo "${GOG_SHA256} /tmp/gog.tar.gz" | sha256sum -c - && \
tar -xzf /tmp/gog.tar.gz -C /usr/local/bin && \
chmod +x /usr/local/bin/gog && \
rm -f /tmp/gog.tar.gz; \
fi
RUN if [ -n "$GOPLACES_SHA256" ]; then \
curl -fsSL -o /tmp/goplaces.tar.gz "https://github.com/steipete/goplaces/releases/download/v${GOPLACES_VERSION}/goplaces_Linux_x86_64.tar.gz" && \
echo "${GOPLACES_SHA256} /tmp/goplaces.tar.gz" | sha256sum -c - && \
tar -xzf /tmp/goplaces.tar.gz -C /usr/local/bin && \
chmod +x /usr/local/bin/goplaces && \
rm -f /tmp/goplaces.tar.gz; \
fi
RUN if [ -n "$WACLI_SHA256" ]; then \
curl -fsSL -o /tmp/wacli.tar.gz "https://github.com/steipete/wacli/releases/download/v${WACLI_VERSION}/wacli_Linux_x86_64.tar.gz" && \
echo "${WACLI_SHA256} /tmp/wacli.tar.gz" | sha256sum -c - && \
tar -xzf /tmp/wacli.tar.gz -C /usr/local/bin && \
chmod +x /usr/local/bin/wacli && \
rm -f /tmp/wacli.tar.gz; \
fi
COPY package.json pnpm-lock.yaml pnpm-workspace.yaml .npmrc ./
COPY ui/package.json ./ui/package.json
COPY patches ./patches
COPY scripts ./scripts
RUN pnpm install --frozen-lockfile
COPY . .
RUN OPENCLAW_A2UI_SKIP_MISSING=1 pnpm build
ENV OPENCLAW_PREFER_PNPM=1
RUN pnpm ui:build
ENV NODE_ENV=production
RUN chown -R node:node /app
USER node
CMD ["node", "openclaw.mjs", "gateway", "--allow-unconfigured"]Obtain checksums by downloading releases and computing locally:
curl -fsSL -o /tmp/gog.tar.gz \
"https://github.com/steipete/gog/releases/download/v0.6.3/gog_Linux_x86_64.tar.gz"
sha256sum /tmp/gog.tar.gz
# Use output as GOG_SHA256 build argπ§ Child Lens: You've drawn the blueprint and bought the materials. Now you actually build the house and turn on the lights.
π¬ First Principles Lens: Every dependency you pull is a trust boundary. --no-cache forces a full rebuild so stale layers can't mask a compromised upstream. Verifying checksums post-build closes the loop: you trusted the hash at build time, now confirm the binary matches at runtime. The entire supply chain β base image, package manager, binaries β is only as strong as its weakest verified link. Checking logs immediately catches startup failures before you assume everything's fine.
# Core build (no skill binaries β omit SHA args entirely):
docker compose build --no-cache
docker compose up -d openclaw-gateway
# With skill binaries (replace with REAL checksums β do NOT use placeholders):
# Get checksums: sha256sum ./path/to/gog ./path/to/goplaces ./path/to/wacli
# docker compose build --no-cache \
# --build-arg GOG_SHA256=abc123... \
# --build-arg GOPLACES_SHA256=def456... \
# --build-arg WACLI_SHA256=789fed...
β οΈ Do not pass placeholder values likeYOUR_GOG_SHA256β non-empty placeholders trigger checksum validation and the build will fail. Either pass real checksums or omit the args entirely.
Verify binaries exist (only if you installed them):
# These only apply if you installed skill binaries β skip if you did a core-only build
docker compose exec openclaw-gateway which gog && echo "β
gog" || echo "βοΈ gog not installed"
docker compose exec openclaw-gateway which goplaces && echo "β
goplaces" || echo "βοΈ goplaces not installed"
docker compose exec openclaw-gateway which wacli && echo "β
wacli" || echo "βοΈ wacli not installed"Verify gateway is up:
docker compose logs -f openclaw-gateway# Use strict installs (fails if packages don't match lockfile)
# pnpm install --frozen-lockfile is already in the Dockerfile
# Audit for known vulnerabilities
docker compose exec openclaw-gateway pnpm auditThe honest gap: OpenClaw updates come via
git pull. Git verifies integrity (SHA hashes on every commit) but not identity (commits aren't GPG-signed). Skills from clawhub have no signature verification currently. Lockfiles and pre-update review are the best local defenses until upstream adds signed releases.
π§ Child Lens: Your house is built and locked. Now you need a way to get in β but you want a secret tunnel, not a door facing the highway.
π¬ First Principles Lens: SSH tunnels encrypt traffic and require key authentication. Tailscale creates a WireGuard mesh with per-device identity. Both keep the gateway off the public internet. Direct internet exposure requires TLS + token + firewall β three things that must all work perfectly, all the time.
ssh -N -L 18789:127.0.0.1:18789 deploy@YOUR_VPS_IPOpen http://127.0.0.1:18789/ and enter your OPENCLAW_GATEWAY_TOKEN.
Keep Docker published on loopback and expose via host-level Tailscale proxy. Do not publish gateway directly to the public internet.
# Install Tailscale (apt repo β NOT curl|sh, consistent with our Docker install approach)
# Detect distro automatically (works for Ubuntu 22.04/24.04 and Debian 12)
if [ ! -f /etc/os-release ]; then echo "ERROR: /etc/os-release not found β install Tailscale manually"; exit 1; fi
DISTRO=$(. /etc/os-release && echo "$ID")
CODENAME=$(. /etc/os-release && echo "$VERSION_CODENAME")
if [ -z "$DISTRO" ] || [ -z "$CODENAME" ]; then echo "ERROR: Could not detect distro/codename from /etc/os-release"; exit 1; fi
curl -fsSL "https://pkgs.tailscale.com/stable/${DISTRO}/${CODENAME}.noarmor.gpg" | tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null
curl -fsSL "https://pkgs.tailscale.com/stable/${DISTRO}/${CODENAME}.tailscale-keyring.list" | tee /etc/apt/sources.list.d/tailscale.list
apt update && apt install -y tailscale
tailscale up
# Verify your Tailscale IP
tailscale ip -4
# Allow Tailscale traffic through UFW
ufw allow in on tailscale0Access the gateway from any device on your tailnet: http://<tailscale-ip>:18789/
β οΈ Docker + UFW footgun: Do NOT change the Docker host bind from127.0.0.1to0.0.0.0to expose the port on Tailscale. Docker bypasses UFW rules for published container ports β your gateway would be exposed to the public internet regardless of UFW settings. Instead, usetailscale serve:tailscale serve --bg http://127.0.0.1:18789This proxies traffic through your tailnet without changing Docker bindings. The container stays locked to localhost.
Security note: Tailscale uses WireGuard for encryption between nodes. If a device on your tailnet is compromised, the attacker can see traffic between that node and others. Treat your tailnet as a trusted network β but not a zero-trust one. For defense-in-depth, the gateway still requires token auth regardless of network path.
Tailscale key management:
- Prefer tagged reusable keys with explicit expiration
- Track key expiry dates and rotate before expiration
- After rotation, verify node connectivity and ACL enforcement
Use Caddy for automatic HTTPS with zero config. Install and create a Caddyfile:
apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list
apt update && apt install caddy# /etc/caddy/Caddyfile
openclaw.example.com {
reverse_proxy 127.0.0.1:18789
}ufw allow 443/tcp
systemctl enable --now caddyCaddy auto-provisions and renews Let's Encrypt TLS certificates. Keep the upstream on 127.0.0.1 β the proxy handles all public-facing traffic.
π§ Child Lens: If someone copies your house key and you never change the locks, they can come in forever and you'd never know. Change the locks regularly, and the copied key stops working.
π¬ First Principles Lens: API tokens are bearer credentials β anyone who has the token IS you. Tokens don't expire by default. The exposure window equals the token's lifetime. Rotation bounds that lifetime. A stolen key that stops working in 90 days is categorically different from one that works forever.
# 1. Generate new token
NEW_TOKEN="$(openssl rand -hex 32)"
# 2. Update .env (OPENCLAW_GATEWAY_TOKEN=$NEW_TOKEN), keep mode 600
# 3. Restart gateway
docker compose up -d --force-recreate openclaw-gateway
# 4. Validate health
curl -sf -H "Authorization: Bearer $NEW_TOKEN" http://127.0.0.1:18789/ > /dev/null
# 5. Re-authenticate all clients with new token
# 6. Invalidate old token everywhere (shell history, password managers, notes)For each provider, document:
- Where to rotate β the dashboard URL
- Where the key lives β which config files (there may be multiple!)
- How to hand off β never paste keys in chat. Use encrypted channels.
- How to verify β what breaks if you got it wrong
Example rotation matrix:
| Provider | Config Locations | Rotation Method |
|---|---|---|
| Anthropic | auth-profiles.json (2 entries) | console.anthropic.com β API Keys |
| OpenAI | env, auth.json | platform.openai.com β API Keys |
| ElevenLabs | openclaw.json (2 places) | elevenlabs.io β Profile |
| Twilio | openclaw.json (3 places) | console.twilio.com (24h grace period!) |
| xAI | openclaw.json | console.x.ai |
| Google/Gemini | openclaw.json + 2 skill configs | aistudio.google.com |
| Brave Search | openclaw.json (2 places) | api.search.brave.com |
| Backblaze B2 | ~/.config/restic/b2.env | backblaze.com β App Keys |
Cadence: Quarterly (every 90 days). Set a recurring reminder. If you don't schedule it, it won't happen.
π§ Child Lens: An alarm clock that's set but not plugged in doesn't wake you up. It looks right β the time is set, the alarm is on β but it's not actually working. You only find out when you oversleep.
π¬ First Principles Lens: A backup system has three parts: the scheduler (triggers it), the tool (creates it), and the verification (proves it worked). Most failures are silent. A backup you haven't verified is not a backup.
/home/deploy/.openclaw(config, auth, state)/home/deploy/.openclaw/workspace(workspace data)
#!/bin/bash
set -euo pipefail
export PATH="/usr/local/bin:/usr/bin:/bin"
BACKUP_DIR="/var/backups/openclaw"
mkdir -p "$BACKUP_DIR"
TS="$(date +%F-%H%M%S)"
if ! tar -C / -czf "${BACKUP_DIR}/openclaw-${TS}.tar.gz" home/deploy/.openclaw 2>&1; then
echo "π¨ Backup tar creation failed at $(date)" >&2
exit 1
fi
# Verify the tarball is readable
if ! tar -tzf "${BACKUP_DIR}/openclaw-${TS}.tar.gz" > /dev/null 2>&1; then
echo "π¨ Backup tarball corrupt at $(date)" >&2
exit 1
fi
# Retention: keep 14 days
find "$BACKUP_DIR" -type f -mtime +14 -delete# /etc/systemd/system/openclaw-backup.service
[Unit]
Description=OpenClaw Backup
[Service]
Type=oneshot
ExecStart=/usr/local/bin/openclaw-backup.sh
# /etc/systemd/system/openclaw-backup.timer
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.targetsystemctl enable --now openclaw-backup.timerWhat we actually found (macOS deployment): A backup scheduled via macOS
launchdto run daily at 3am. The plist was loaded, the script existed, the configuration looked correct. But:exit code 127 β restic: command not found. launchd doesn't inherit your shell's PATH. The backup had been silently failing every night. Only one snapshot existed β from a manual run days earlier. (On Linux, systemd has a similar gotcha β always use absolute paths in timer units.)
Always use absolute paths in scheduled scripts:
#!/bin/bash
# Linux:
export PATH="/usr/local/bin:/usr/bin:/bin"
RESTIC=/usr/local/bin/restic
# macOS: uncomment below instead
# export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
# RESTIC=/opt/homebrew/bin/restic
$RESTIC backup ~/.openclaw/workspace \
--exclude '.git' \
--exclude 'node_modules' \
--tag openclaw-workspace
$RESTIC forget \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly 6 \
--prunerestic snapshots # Are new ones appearing?
restic check # Repository integrity- Restore to a fresh VPS
- Start with same
.env - Confirm channels/auth/session state are intact
The irony: We had just added backup failure alerting 10 minutes before discovering the backup was already broken. If we'd had monitoring earlier, we'd have caught it days ago. Order matters.
π§ Child Lens: Your house has smoke detectors. They don't prevent fires β they make sure you KNOW there's a fire so you can act. Without them, a small problem becomes a big problem while you're asleep.
π¬ First Principles Lens: Every automated system can fail silently. The cost of a failure is proportional to how long it goes undetected. Monitoring doesn't prevent failures β it bounds the detection time.
| Check | Threshold | Why |
|---|---|---|
| Disk space | >85% warning, >95% critical | Full disk = no logs, no backups, cascading failures |
| Gateway process | Not running | If it's down, everything's down |
| Free memory | <50MB | OOM kills are silent and random |
| Last backup age | >36 hours | Catches silent backup failures |
| Container restarts | Repeated | Crash loop indicates config or resource problem |
| Auth cooldowns | Rate-limited profiles | You're burning through quota |
#!/bin/bash
STATUS="healthy"
ALERTS=""
# Disk
DISK_PCT=$(df -h / | awk 'NR==2 {gsub(/%/,""); print $5}')
if [ "$DISK_PCT" -gt 95 ]; then
STATUS="critical"; ALERTS+="π¨ Disk ${DISK_PCT}% full\n"
elif [ "$DISK_PCT" -gt 85 ]; then
STATUS="warning"; ALERTS+="β οΈ Disk ${DISK_PCT}% full\n"
fi
# Gateway container
if ! docker compose ps --status running openclaw-gateway | grep -q "openclaw-gateway"; then
STATUS="critical"; ALERTS+="π¨ Gateway container not running!\n"
fi
# Memory
FREE_MB=$(free -m | awk '/Mem:/ {print $7}')
if [ "$FREE_MB" -lt 50 ]; then
STATUS="warning"; ALERTS+="β οΈ Low memory: ${FREE_MB}MB available\n"
fi
# Backup age
# Use stat for portability (GNU find -printf not available on all systems)
LAST_BACKUP=$(find /var/backups/openclaw -name '*.tar.gz' 2>/dev/null | xargs -r stat --format='%Y' 2>/dev/null | sort -rn | head -1)
if [ -n "$LAST_BACKUP" ]; then
NOW=$(date +%s)
HOURS_AGO=$(( (NOW - ${LAST_BACKUP%.*}) / 3600 ))
if [ "$HOURS_AGO" -gt 36 ]; then
STATUS="warning"; ALERTS+="β οΈ Backup stale: ${HOURS_AGO}h old\n"
fi
else
STATUS="warning"; ALERTS+="β οΈ No backups found!\n"
fi
# Gateway health probe
if ! curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/ > /dev/null 2>&1; then
STATUS="warning"; ALERTS+="β οΈ Health probe failed\n"
fi
echo "status:$STATUS"
if [ -n "$ALERTS" ]; then
echo -e "$ALERTS"
# Discord webhook alerting
if [ -n "$DISCORD_WEBHOOK_URL" ]; then
PAYLOAD=$(echo -e "$ALERTS" | jq -Rs '{content: ("π¨ **OpenClaw Health Alert**\n" + .)}')
curl -sf -X POST -H "Content-Type: application/json" -d "$PAYLOAD" "$DISCORD_WEBHOOK_URL"
fi
# ntfy.sh alternative (lightweight, no setup)
if [ -n "$NTFY_TOPIC" ]; then
echo -e "$ALERTS" | curl -sf -d @- "https://ntfy.sh/${NTFY_TOPIC}"
fi
fiThe health script needs OPENCLAW_GATEWAY_TOKEN to probe the gateway. Source it from your env file:
# /etc/cron.d/openclaw-health
SHELL=/bin/bash
OPENCLAW_GATEWAY_TOKEN="" # paste token here (not sourced from .env β Docker .env files aren't guaranteed shell-safe)
LOG_DIR=/home/deploy/openclaw/logs
*/30 * * * * deploy mkdir -p $LOG_DIR && cd /home/deploy/openclaw && /usr/local/bin/openclaw-health.sh 2>&1 | grep -v "^status:healthy$" >> $LOG_DIR/openclaw-alerts.log- Gateway logs may contain request metadata β review before shipping externally
Implement log rotationβ Handled byjson-filelogging driver in compose (max-size: 10m,max-file: 3)- If using log aggregation, ensure transport is encrypted and destination is access-controlled
The principle: Good monitoring is silent when everything's fine and loud when something's wrong. If it's noisy, you'll ignore it. If it's silent, you'll forget it exists.
π§ Child Lens: Before you install the update, check what's in the box. Don't just click "update all" and hope.
π¬ First Principles Lens: Every update is a trust decision. git pull verifies integrity (SHA hashes) but not identity (no GPG signatures). Reviewing changes before applying them is the best local defense.
#!/bin/bash
echo "=== Current Version ==="
git log --oneline -1
echo "=== Upstream Changes ==="
git fetch origin
git log --oneline HEAD..origin/main
echo "=== Package Audit ==="
docker compose exec -T openclaw-gateway pnpm audit 2>/dev/null || echo "No vulnerabilities"git pull --ff-only
docker compose build
docker compose up -d openclaw-gateway
curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/The rule: review before you update. Never blind-update.
Tag known-good states before updating so you can revert cleanly:
# Before updating, tag the current working state
git tag "good-$(date +%F)"
# Update
git pull --ff-only
docker compose build
docker compose up -d openclaw-gateway
# If something breaks β roll back
docker compose down
git checkout good-2026-02-07 # your last known-good tag
docker compose build
docker compose up -d openclaw-gatewayKeep the last 3 known-good images to avoid re-building on rollback:
# List OpenClaw images by date
docker images openclaw --format "{{.ID}} {{.CreatedAt}}" | sort -k2 -r
# Remove all but the 3 most recent (sorted by creation time, newest first)
docker images openclaw --format "{{.CreatedAt}}\t{{.ID}}" | sort -r | tail -n +4 | awk '{print $NF}' | xargs -r docker rmi
# Clean up dangling layers
docker image prune -fRepo delta: The upstream repo does not provide GPG-signed releases or reproducible builds. Skills from clawhub have no signature verification. Until upstream addresses this, lockfiles + pre-update review are your best defense.
π§ Child Lens: Cash in a locked drawer is protected as long as nobody picks the lock. But if someone breaks in, the drawer lock is the only thing left. A safe inside the drawer adds a layer β but only if the combination isn't taped underneath.
π¬ First Principles Lens: File permissions (chmod 600) protect against other local users. They don't protect against a compromised process running as your user, physical theft of an unencrypted disk, or unencrypted backups.
| Scenario | Recommendation |
|---|---|
| Hetzner VPS | LUKS at provisioning, file permissions tight, rotate keys quarterly |
| Always-on home server | Disk encryption OFF (accept risk for auto-boot), file permissions tight, rotate quarterly |
| Office/colo server | Full disk encryption ON, accept manual reboots |
| Belt-and-suspenders | Full disk + age/sops + rotation |
- LUKS (full disk): Protects against physical theft. Hetzner offers it at provisioning time. Use it.
- File-level (age/sops): Diminishing returns β the decryption key must be accessible at runtime, creating circular dependency.
π§ Child Lens: If someone hands you a mystery box and says "trust me, it's the right thing" β you'd want to at least check the label matches what you ordered.
π¬ First Principles Lens: If building locally, the source is the trust anchor. For pre-built images, verify the digest and scan for vulnerabilities.
# Pin images by digest in production
# image: openclaw@sha256:abc123...
# Scan for vulnerabilities
docker scout cves openclaw:hetzner
# or
trivy image openclaw:hetzner| Container Path | Host Path | Source |
|---|---|---|
/home/node/.openclaw |
${OPENCLAW_CONFIG_DIR} |
Volume mount |
/home/node/.openclaw/workspace |
${OPENCLAW_WORKSPACE_DIR} |
Volume mount |
/usr/local/bin/* |
N/A | Image build |
Container can be recreated safely if and only if host volumes are intact.
Data integrity considerations:
- Volume corruption: Docker volumes use the host filesystem β if the host disk corrupts, volumes corrupt too. This is why offsite backups (Section 10) are non-negotiable.
- Upgrade migrations: Before any OpenClaw version upgrade, snapshot the volumes:
tar -czf openclaw-pre-upgrade-$(date +%s).tar.gz /home/deploy/.openclaw/. If the new version changes data formats, you have a clean rollback point. - Ownership drift: If you rebuild the container with a different UID, volume permissions break. Always verify with
docker compose exec openclaw-gateway idafter rebuilds. - Never store state inside the container that isn't on a mounted volume.
docker compose down && docker compose up -dmust be a no-op for your data.
π§ Child Lens: A to-do list isn't just for things you're going to do. It's also for things you've decided NOT to do β and why.
π¬ First Principles Lens: Security is a spectrum of trade-offs. Documenting accepted risks is not negligence β it's engineering. The dangerous position is having unexamined risks, not having documented ones.
### [Risk Name] β [Current State]
**Risk:** What could go wrong.
**Why accepted:** Why this trade-off makes sense.
**Mitigations:** What you're doing instead.
**Revisit when:** Conditions that would change the decision.- No GPG on updates β Can't verify commit authorship. Upstream gap. Bounded by pre-update review script.
- No age/sops file encryption β Circular dependency at runtime. Bounded by file permissions + token rotation.
- Full disk encryption off (home server variant) β Physical theft exposes disk. Bounded by token rotation + location.
- Changing container gateway port away from
18789in compose command - Using
releases/latestbinary URLs in production - Installing binaries manually inside a running container (lost on recreate)
- Exposing gateway publicly without firewall + TLS + token
- Leaving stale tokens unrotated after team/user changes
- Scheduling backups without verifying they actually run
- Using
npm installinstead ofnpm ci/pnpm install --frozen-lockfile - Piping secrets through Discord or Slack
Use this after deployment. Every box should be checked or have a documented reason why not.
- Firewall enabled, default deny incoming
- SSH key-only auth (password auth disabled)
-
.envexists withchmod 600 -
OPENCLAW_GATEWAY_TOKENset (not empty) - Docker ports bound to
127.0.0.1 - Persistent dirs owned by UID 1000, mode 700
-
security_opt: no-new-privilegesset in compose - Docker socket NOT mounted into container
- Docker installed from signed apt repo (not
curl | sh) - Skill binaries pinned to version + SHA256 checksum
- Lockfiles present and used (
--frozen-lockfile) - No
releases/latestURLs in Dockerfile -
.gitignorecovers auth-profiles.json, *.env, secrets
- Backup automation running and verified (check for real output!)
- Backup retention policy configured
- Recovery tested on a fresh VPS at least once
- Health monitoring active (disk, process, memory, backup age)
- Log rotation configured
- Token rotation runbook documented
- Quarterly rotation reminder set
- Gateway not exposed to public internet (SSH tunnel or Tailscale)
- If TLS-terminated: certs auto-renew, HTTPS enforced, proxy β gateway on loopback
- Tailscale keys tracked with expiry dates (if used)
- Accepted risks documented with rationale
- Provider rotation matrix filled in
- Pre-update review script in place
- This checklist reviewed on every major update
Run this checklist after initial deployment or any major update. You can run it manually or use the automated script below.
# 1. Health endpoint responds
curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/
echo "β
Gateway reachable"
# 2. Container runs as non-root
docker compose exec openclaw-gateway id
# Expect: uid=1000(node) gid=1000(node) β NOT root
# 3. Firewall rules correct
ufw status verbose
# Expect: default deny incoming, SSH allowed, no 18789 open to public
# 4. Backup cycle works
/usr/local/bin/openclaw-backup.sh && echo "β
Backup succeeded"
ls -lh /var/backups/openclaw/ | head -3
# 5. Channel connectivity (Discord, etc.)
# Send a test message through your configured channel and confirm delivery
# 6. Volumes have correct ownership
ls -la /home/deploy/.openclaw/
# Expect: owned by 1000:1000, mode 700Save as ops/smoke-test.sh and run after every deployment or update:
#!/usr/bin/env bash
set -euo pipefail
source .env 2>/dev/null || true
PASS=0; FAIL=0
check() {
if eval "$2" >/dev/null 2>&1; then
echo "β
$1"; ((PASS++))
else
echo "β $1"; ((FAIL++))
fi
}
check "Gateway health" \
'curl -sf -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" http://127.0.0.1:18789/'
check "Non-root container" \
'[ "$(docker compose exec -T openclaw-gateway id -u)" = "1000" ]'
check "UFW active" \
'ufw status | grep -q "Status: active"'
check "18789 bound to localhost only" \
'docker inspect $(docker compose ps -q openclaw-gateway) 2>/dev/null | grep -q "127.0.0.1:.*18789"'
check "Volume ownership" \
'[ "$(stat -c %u /home/deploy/.openclaw 2>/dev/null || stat -f %u /home/deploy/.openclaw)" = "1000" ]'
check "Log rotation configured" \
'docker inspect $(docker compose ps -q openclaw-gateway) 2>/dev/null | grep -q max-size'
echo ""
echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" -eq 0 ] || exit 1chmod +x ops/smoke-test.sh
./ops/smoke-test.shπ§ Child Lens: When the lights go out, you check the breaker box β not rewire the whole house. Start with the most obvious cause.
π¬ First Principles Lens: Most failures are configuration errors, not bugs. Logs tell you what happened; network tools tell you what's reachable. Check both before changing anything.
| Symptom | Diagnosis | Fix |
|---|---|---|
| Container won't start | docker compose logs -f openclaw-gateway β look for crash reason |
Fix config/env, then docker compose up -d |
| Gateway unreachable | ss -tlnp | grep 18789 β is it listening? Check UFW: ufw status |
Verify bind address in .env, check docker compose ps |
| Auth failures (401) | Token mismatch between client and server | Compare $OPENCLAW_GATEWAY_TOKEN in .env vs what client sends. Restart after changes: docker compose up -d --force-recreate |
| Backup failures | Check exit code: echo $? after manual run. Common: disk full, wrong permissions |
df -h for space, ls -la /var/backups/openclaw/ for perms |
| Discord/channel disconnects | Gateway lost websocket connection | docker compose restart openclaw-gateway β check logs for rate limits |
| OOM killed | dmesg | grep -i oom |
Increase mem_limit in compose or reduce workload |
Security isn't a destination. It's a practice β like brushing your teeth. You do it regularly, you do it honestly, and you document what you skip and why.
The three most dangerous words in security are "it should work." Check. Verify. Test the backup by restoring from it. Run the health monitor and see if it actually alerts. Try to read your own secrets from a different user account.
Trust, but verify. Then verify again.
Based on Brad Barbin's Hetzner deployment gist and a real security audit conducted February 2026. All scripts referenced here live in the ops/ directory of the OpenClaw workspace.