Skip to content

Instantly share code, notes, and snippets.

@ttarler
Created January 27, 2026 01:09
Show Gist options
  • Select an option

  • Save ttarler/011f1bc0db058478e58889d5492d5b66 to your computer and use it in GitHub Desktop.

Select an option

Save ttarler/011f1bc0db058478e58889d5492d5b66 to your computer and use it in GitHub Desktop.
Medium Post 5: CI/CD and Deployment When AI Writes the Code

CI/CD and Deployment When AI Writes the Code

The Illusion: "Committed = Running"

I committed a critical fix at 5:51 PM. Pushed to GitHub. CI passed. Green checkmark.

At 6:15 PM, the bug was still happening.

Why? The Docker containers were still running the old code.

This post is about a lesson every developer learns eventually, but that hits differently when AI is writing your code: Committed ≠ Running.

And more importantly: "Deployment successful" ≠ "System working."

The Problem: Stateful Containers

What I Thought Happened

  1. Commit code
  2. Push to GitHub
  3. GitHub Actions builds new image
  4. ECS pulls new image
  5. New code runs

What Actually Happened

  1. Commit code ✅
  2. Push to GitHub ✅
  3. GitHub Actions builds new image ✅
  4. ECS keeps running the old image
  5. Old code runs ❌

Why: Docker containers are stateful. They don't automatically restart when you commit code. They don't auto-pull new images. They run what they're running until you tell them to stop.

The Fix I Should Have Known

# After commit + push
docker-compose restart backend celery_worker celery_beat

# Or in production (ECS)
aws ecs update-service --cluster trading-prod \
  --service backend --force-new-deployment

The missing step: Restart. Always. Every time.

The Deployment Process That Actually Works

After multiple "why isn't my fix working" incidents, I codified this process:

1. Commit and Push

git add backend/app/services/execution.py
git commit -m "fix: Stop-loss monitoring event loop issue"
git push origin develop

Standard stuff. This triggers CI.

2. Wait for CI to Pass

GitHub Actions runs:

  • Unit tests
  • Integration tests
  • Coverage check (≥80%)
  • Linting (black + ruff)
  • Type checking (mypy)
  • Security scan

Green checkmark = safe to deploy. But not "deployed."

3. Restart Affected Services

The critical step everyone skips.

# Local (docker-compose)
docker-compose restart <service-name>

# Production (ECS)
./scripts/restart_service.sh <service-name>

Service mapping (which services to restart):

Changed Files Restart
backend/app/**/*.py backend
backend/app/services/*_tasks.py celery_worker, celery_beat
backend/app/agents/**/*.py celery_worker, celery_beat
frontend/**/*.tsx frontend
backend/app/db/models.py backend, celery_worker, celery_beat
.env or env vars ALL services

Rule of thumb: When in doubt, restart both backend and celery_worker celery_beat.

4. Verify Deployment (The Step That Prevents Disasters)

This is where AI agents fail most often.

The agent will say: "Deployed successfully! The fix is live."

What actually happened: Service restarted. But did the fix work? Check.

# Check logs for errors
docker-compose logs backend --tail=50 | grep -i error
docker-compose logs celery_worker --tail=50 | grep -i error

# Verify the NEW code path is executing
docker-compose logs celery_worker --tail=100 | grep "stop-loss check"

# Check that the specific fix is working
# Example: If you fixed event loop errors, verify no event loop errors in logs
docker-compose logs celery_worker --tail=200 | grep "Future attached to different loop"
# Should return nothing if fixed

The verification policy:

After ANY deployment, verify it's working BEFORE reporting success.

  1. Deploy change
  2. Check logs for errors within 2-5 minutes
  3. Verify the NEW code path is being executed (not fallback)
  4. THEN say "deployed and verified working"

5. Monitor for Regressions

Even after verification, check again 30 minutes later:

# Quick health check
curl https://api.example.com/health
# or
docker-compose exec backend python -c "from app.services.broker import broker_service; import asyncio; print(asyncio.run(broker_service.get_market_clock()))"

# Check metrics (if available)
# - BUY:SELL ratio (should be ~2:1, not 100:0)
# - Task success rates
# - Position closure rate

The Real Mishap: EventBridge Rules Disabled

This one cost me 1.5 hours on a Monday morning.

What Happened

Sunday night (Jan 5, 2026, 9:00 PM): Prepared for Monday trading. Checked ECS services (running), SageMaker endpoints (active), latest premarket screening run (successful).

Monday morning (Jan 8, 2026, 9:07 AM): 23 minutes before market open, I discover all EventBridge trading rules are DISABLED.

No premarket screening ran. No execution scheduled. No position monitoring. The system would have sat idle all day.

Why It Happened

Jan 7 commit (014e03e): Terraform change set enable_live_trading = false by default for safety during RL model setup.

# infrastructure/main.tf
variable "enable_live_trading" {
  default = false  # ← ADDED FOR SAFETY
}

resource "aws_cloudwatch_event_rule" "execute_strategies" {
  name                = "trading-prod-execute-active-strategies"
  schedule_expression = "cron(*/5 9-16 ? * MON-FRI *)"
  is_enabled          = var.enable_live_trading  # ← DISABLES ALL TRADING
}

The Terraform apply ran successfully. No errors. Rules disabled.

I never checked EventBridge status in my pre-market checklist.

The Fix

# Check rule status
aws events list-rules --name-prefix trading-prod \
  --query 'Rules[].{Name:Name,State:State}' --output table

# Expected: All ENABLED
# Actual: All DISABLED

# Enable via Terraform
# infrastructure/prod.tfvars
enable_live_trading = true

# Apply
cd infrastructure && terraform apply -var-file=prod.tfvars -auto-approve

# Verify
aws events list-rules --name-prefix trading-prod \
  --query 'Rules[?State==`ENABLED`].Name'

Time to fix: 18 minutes (9:07 AM - 9:25 AM).

Time to discover if I hadn't checked: 6.5 hours (full trading day lost).

The Lesson: Pre-Market Checklist

Now, every Monday morning (or any day after infrastructure changes):

# docs/runbooks/trading-morning-go-no-go.md

## 8:25 AM ET — Confirm scheduled tasks will run

CRITICAL: Verify EventBridge rules are ENABLED

aws events list-rules --name-prefix trading-prod \
  --query 'Rules[?contains(Name, `premarket`) || contains(Name, `execute`) || contains(Name, `stop-loss`)].{Name:Name,State:State}' \
  --output table

Expected: All rules show State = ENABLED

If ANY rule shows DISABLED, this is a HARD NO-GO

The rule: Never assume scheduled tasks are enabled. Always check.

When the AI Agent Says "How's Trading?"

This scenario has burned me multiple times.

The Wrong Pattern

User: "How is trading going today?"

AI Agent: "Great! Portfolio is up 0.5%, 12 trades executed, 8 winners, 4 losers. S&P is flat so we're outperforming."

Reality: The RL model failed at 9:45 AM. All trades after that used the fallback heuristic. The agent reported portfolio stats without checking system health.

The Right Pattern

User: "How is trading going today?"

AI Agent:

  1. FIRST: Check worker logs for errors (last 30 min)
    docker-compose logs celery_worker --since=30m | grep -i error
  2. SECOND: Verify expected code paths are executing
    docker-compose logs celery_worker --tail=100 | grep "RL portfolio decision"
    # Should see: "RL model made 5 decisions" (not "RL unavailable, using fallback")
  3. THIRD: Report metrics
    • Portfolio P&L
    • Trade count (BUY:SELL ratio should be reasonable)
    • Win rate
    • System health (tasks succeeding, model being used)

NEVER: Blame "market conditions" or "volatility" without first verifying your system is healthy.

The policy:

If you just deployed something, the FIRST thing to check is whether YOUR deployment is working. Not portfolio numbers, not market conditions. YOUR code. Is it running? Is it erroring?

CI/CD Setup: GitHub Actions

Our CI pipeline enforces quality gates and automates deployments.

Backend Tests Workflow

# .github/workflows/backend-tests.yml
name: Backend Tests

on:
  push:
    branches: [develop, main, 'feature/**']
    paths: ['backend/**']

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          cd backend
          pip install -r requirements.txt
      
      - name: Run unit tests
        run: pytest tests/unit/ -v --tb=short --maxfail=5
      
      - name: Run integration tests
        run: pytest tests/integration/ -v --tb=short --maxfail=3
      
      - name: Run all tests with coverage
        run: |
          pytest tests/ --cov=app --cov-fail-under=80 \
            --cov-report=xml --cov-report=term-missing -v
      
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v4
        with:
          files: ./backend/coverage.xml
          flags: backend
          fail_ci_if_error: false

What this catches: Code that doesn't compile, tests that fail, coverage regressions.

What this doesn't catch: Deployment issues, service restart failures, production-only bugs.

Linting with Auto-Fix

lint:
  runs-on: ubuntu-latest
  steps:
    - name: Auto-fix code formatting
      run: |
        black app/ tests/
        ruff check --fix app/ tests/
    
    - name: Commit auto-fixes
      if: github.ref == 'refs/heads/develop'
      run: |
        git config user.name "github-actions[bot]"
        git config user.email "github-actions[bot]@users.noreply.github.com"
        git add .
        if ! git diff --staged --quiet; then
          git commit -m "style: Auto-fix linting issues"
          git push
        fi

Why auto-commit: Developers shouldn't get distracted by style when tests fail. Auto-fix and move on.

Terraform Deployment

# .github/workflows/terraform-deploy.yml (simplified)
name: Terraform Deploy

on:
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Plan
        run: terraform plan -var-file=prod.tfvars -out=plan.out
      
      - name: Terraform Apply
        run: terraform apply plan.out
      
      - name: Force ECS service update
        run: |
          aws ecs update-service --cluster trading-prod \
            --service backend --force-new-deployment
          aws ecs update-service --cluster trading-prod \
            --service celery-worker --force-new-deployment

What this does: Applies infrastructure changes, forces service redeployment.

What this doesn't do: Verify the deployment worked. That's on you.

Runbooks: When CI Doesn't Catch It

CI catches code-level issues. Runbooks catch operational issues.

Example: Trading Morning Go/No-Go

# docs/runbooks/trading-morning-go-no-go.md

## Timeline

### 8:00 AM ET — Baseline health
- CloudWatch: No alarms paging
- ECS services: All healthy (backend, celery-worker, celery-beat)

### 8:25 AM ET — Confirm scheduled tasks
- EventBridge rules: ENABLED
- Latest premarket run: Successful (check logs)

### 8:45 AM ET — Verify SageMaker endpoints
- RL endpoint: InService
- Prescreening endpoint: InService (if enabled)

### 9:25 AM ET — Final GO/NO-GO decision
- API health endpoint: 200 OK
- Worker health: No errors in last 30 min
- Buying power: > $0 (no stale orders blocking)

If all checks pass: **GO**
If any check fails: **NO-GO** (investigate)

Example: Hotfix Deployment

# docs/runbooks/hotfix-deployment.md

1. Create hotfix branch from main
2. Make fix, test locally
3. Push, wait for CI
4. Deploy to production:
   ```bash
   ./scripts/deploy_hotfix.sh <service-name>
  1. Verify deployment:
    • Check logs for errors (5 min)
    • Verify fix is working (specific check for the bug)
    • Monitor for regressions (30 min)
  2. If verification fails: Rollback immediately
    ./scripts/rollback_service.sh <service-name>

## The Deployment Checklist

Every deployment, every time:

- [ ] Commit and push
- [ ] CI passes (green checkmark)
- [ ] Identify affected services
- [ ] Restart affected services
- [ ] Check logs for errors (2-5 min)
- [ ] Verify NEW code path is executing
- [ ] Check metrics (BUY:SELL ratio, task success)
- [ ] Monitor for regressions (30 min)

**Only then** can you say "deployed and verified working."

## Key Takeaways

1. **Committed ≠ Running**: Always restart affected services after code changes.

2. **"Deployment successful" ≠ "System working"**: Verify with logs and metrics.

3. **Check outcomes, not status**: "Task scheduled" ≠ "task succeeding." Check worker logs.

4. **Pre-flight checklists**: Verify system health before critical operations (trading day, deployments).

5. **CI catches code issues, runbooks catch operational issues**: Both are necessary.

6. **When asked "how's it going?"**: Check system health first, then report metrics.

7. **Auto-fix style issues**: Don't make developers deal with formatting when tests fail.

## Your Turn: Create Your Deployment Checklist

Take your most critical deployment (production API, cron job, ML model) and write a checklist:

1. What needs to restart?
2. How do you verify it worked?
3. What metrics should you check?
4. What's the rollback procedure?

Commit it. Follow it. Every time.

In the next post, I'll tell you about the ML model that was "live" for 6 days but never made a single production decision—and how to prevent it.

---

*This is Post 5 of an 8-part series on building a full-stack AI trading application with LLM coding agents. Next: The ML Model Was 'Live' for 6 Days—It Never Made a Single Decision.*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment