Skip to content

Instantly share code, notes, and snippets.

@marccampbell
Last active March 7, 2026 12:15
Show Gist options
  • Select an option

  • Save marccampbell/d12dc3d3fad16e053cd5a087b5f9452b to your computer and use it in GitHub Desktop.

Select an option

Save marccampbell/d12dc3d3fad16e053cd5a087b5f9452b to your computer and use it in GitHub Desktop.
Support Bundle AI - Progress Notes Mar 6-7 2026

2026-03-06

Support Bundle AI — Major Progress

What Happened

  • Rebased feat/daytona-provisioning onto latest main (resolved otel version conflicts)
  • Added "Analyze with AI" menu item to SupportBundleRow (gated by ai_support_bundle_analysis feature flag)
  • Wired button to POST /v3/supportbundle/:bundleId/agent/start
  • Renamed tables: support_bundle_analysisai_support_bundle_analysis (+ bundle join table)
  • Moved SchemaHero schemas to correct path: migrations/kustomize/schemas/mysql/
  • Switched Creddy auth from vend tokens to OIDC (client_credentials flow)
    • CREDDY_CLIENT_ID + CREDDY_CLIENT_SECRET replace CREDDY_AGENT_TOKEN
    • Token cached with 5min buffer, 401 clears cache
  • Per-analysis Creddy agent creation via POST /v1/admin/agents (4h TTL, github:read scope)
  • Fixed Creddy API paths: /v1/credentials/{backend}, /v1/admin/agents
  • Added --oidc-issuer to creddy server startup
  • Fixed tailscale serve + MagicDNS approach:
    • Creddy server: localhost:8400 + tailscale serve on 443
    • ACL updated: port 443 allowed on tag:agent-credentials
    • Agent sandboxes: bootstrap fixes /etc/resolv.conf (adds 100.100.100.100 first)
  • Bootstrap updated for OIDC (curl-based, no creddy CLI needed)
  • Removed creddy CLI from agent Dockerfile (OIDC via curl instead)
  • Timestamped snapshot names so builds don't block running sandboxes
  • Increased provision timeout to 10min, ExecuteCommand timeout to 5min
  • Disabled cleanup temporarily for debugging
  • Suppressed noisy ES consumer and EP cleanup logs

Current State

  • Creddy OIDC auth works end-to-end from agent sandbox
  • Tailscale connects, MagicDNS resolves after resolv.conf fix
  • GitHub backend just configured on Creddy
  • Need to test full bootstrap completion (bundle download + OpenClaw start)

Key Params in Doppler

  • CREDDY_SERVER_HOST = creddy-server-dev.tail1ed40.ts.net
  • CREDDY_CLIENT_ID = vendor-api agent client ID
  • CREDDY_CLIENT_SECRET = vendor-api agent client secret
  • DAYTONA_API_KEY, DAYTONA_AGENT_SNAPSHOT = snapshot with timestamp

Late Session (Mar 7 ~3-4am UTC)

What got working:

  • Full Creddy OIDC flow from agent sandbox: Tailscale → MagicDNS → HTTPS to Creddy → JWT → fetch credentials ✅
  • Tailscale auth key from Creddy ✅
  • Per-analysis Creddy agent creation ✅
  • GitHub token from Creddy ✅ (after fixing POST vs GET, GitHub App installation, backend config)
  • Bundle download hits ngrok (S3_EXTERNAL_ENDPOINT) ✅

What's blocked:

  • Bundle download from sandbox fails — dev S3 (MinIO in colima cluster) not properly reachable via ngrok
    • S3_EXTERNAL_ENDPOINT param added, uses param.Get() not os.Getenv()
    • ngrok running at s3-marc.ngrok.devs3.localhost:8000 → traefik → MinIO:9000
    • 404 from ngrok — likely Host header mismatch (traefik routes by host)
    • Try ngrok http s3.localhost:8000 --host-header=s3.localhost:8000
    • Or broader fix: S3_BROWSER_ENDPOINT also broken for browser uploads (same root issue)
    • Consider: use R2 for bundle storage in prod, ngrok is dev-only workaround

Bootstrap script fixes still needed in snapshot:

  • GitHub credential fetch must use POST not GET (fixed in repo, not yet in snapshot)
  • Need to rebuild snapshot after fixing S3 download path

Cleanup still disabled — sandboxes not deleted on failure (for debugging)

Latest commit: 585e16a0a on feat/daytona-provisioning

Branch

  • feat/daytona-provisioning on replicatedhq/vandoor
  • fix/creddy-allow-443 on replicatedhq/replicated-tailscale

2026-03-07

What Happened

  • Reverted bad commit (a42aa7210) that referenced nonexistent GetS3BrowserClient() — was breaking build
  • Fixed SupportBundleAnalysis.jsx polling bugs:
    • Removed refetchChannel() from polling setTimeout — refetch() bypasses react-query's enabled guard, causing 404 spam on /channel/ when downstreamChannelId is empty
    • Changed polling condition from status !== "uploaded" to status === "pending" — prevents infinite polling on error responses or unexpected statuses
    • Added 5-minute age limit on pending bundle polling — stuck bundles (upload never completed) no longer poll forever
  • Fixed bundle download CSP violation:
    • Download presigned URLs were generated using internal S3 endpoint (s3.default.svc.cluster.local:9000) — unreachable from browser
    • Now uses S3_PRESIGN_ENDPOINT env var (same as upload path) for browser-facing presigned URLs
    • Added http://s3.localhost:8000 to frame-src in telepresence.js CSP

Current State

  • Bundle analysis page no longer flickers/polls on uploaded or stuck bundles
  • Bundle downloads work in local dev (CSP + presign endpoint fixed)
  • Agent provisioning: same as late Mar 7 — blocked on dev S3 reachability from sandbox

Commits on feat/daytona-provisioning

  • ca9f54e4d — Revert GetS3BrowserClient
  • c2318cfc7 — Remove refetchChannel from polling loop
  • b51148696 — Only poll when status is pending
  • ae960c599 — Stop polling stuck bundles after 5min
  • 3b6dad66b — Use S3_PRESIGN_ENDPOINT for download URLs
  • 9cdf21c1e — Add s3.localhost:8000 to CSP frame-src

Resume Hints

Start ngrok (exposes local S3/MinIO to agent sandboxes)

ngrok http s3.localhost:8000 --domain s3-marc.ngrok.dev

Create/recreate Creddy server on Daytona

cd vandoor/scripts/creddy
DAYTONA_API_KEY=<key> CREDDY_SANDBOX_NAME=creddy-server-dev npx tsx create-creddy-server.ts <TAILSCALE_AUTH_KEY>
  • Requires DAYTONA_API_KEY env var
  • Tailscale auth key is the first positional arg
  • CREDDY_SANDBOX_NAME defaults to creddy-server if not set
  • After creation, configure the GitHub backend: creddy backend add github --app-id <ID> --private-key ./app.pem

Build/rebuild agent snapshot

cd vandoor/scripts/support-tooling
./create-agent-snapshot.sh openclaw-agent-dev
  • Appends -YYYYMMDD-HHMMSS timestamp to snapshot name
  • Update DAYTONA_AGENT_SNAPSHOT param in Doppler after rebuild

SSH into a Daytona sandbox

TERM=xterm-256color daytona sandbox ssh <sandbox-id>

Key env vars (Doppler / telepresence overlay)

  • DAYTONA_API_KEY — Daytona API auth
  • DAYTONA_AGENT_SNAPSHOT — snapshot name (with timestamp)
  • CREDDY_SERVER_HOSTcreddy-server-dev.tail1ed40.ts.net
  • CREDDY_CLIENT_ID / CREDDY_CLIENT_SECRET — vendor-api's Creddy OIDC creds
  • S3_PRESIGN_ENDPOINT — browser/external-facing S3 presign endpoint

Next Steps

  1. Fix dev S3 reachability from agent sandbox (ngrok host-header or R2 for prod)
  2. Rebuild agent snapshot with latest bootstrap fixes
  3. End-to-end test: Analyze with AI → sandbox → bootstrap completes → OpenClaw starts
  4. Enable feature flag for test team
  5. Apply DB migrations
  6. Re-enable sandbox/volume cleanup
  7. Build chat UI in vendor-web
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment