oleander/design.md

## design.md

      
    Raw
  

              design.md
            
          
    Design Document: Kubernetes Rails Deployment

Overview

This design describes the architecture for deploying a Rails application on OVH Kubernetes with separate containers for the web server, delayed job workers, and Kafka consumers. The solution uses Docker multi-stage builds to create separate container images for each process type from a single Dockerfile. Each build target contains the appropriate CMD to start its specific process in the foreground with proper signal handling, health checks, and JSON logging for Logz.io integration.
Migration from ECS Build Server Model

Previous Deployment Architecture (Deprecated)

The previous deployment model had a disconnected build and deploy process:
Build Process:

Developer merges code to master
.github/workflows/cd.yml builds playwright target → pushed to GHCR (never used)
Operator manually SSHs to build server in storecove-app-docker directory
Runs ./build-deploy -s true -a true -b master script which:

Clones fresh copy of datajust repo
Builds using storecove-app-docker/production/Dockerfile
Copies assets to S3 CDN
Pushes to AWS ECR


Deploy Process:

Script runs aws ecs update-service --force-new-deployment for 3 services
ECS pulls latest image from ECR
Starts new tasks with monolithic container

Problems with this approach:

Manual intervention required
CI builds wasted (never deployed)
Different Dockerfile for production vs. CI
Build-from-scratch on every deploy (slow)
Assets managed separately on S3

New OVH Kubernetes Architecture

The new model unifies build and deploy into one automated workflow:
Build & Deploy Process:

Developer merges code to master
.github/workflows/deploy.yml automatically triggers
Builds 5 Docker targets from datajust/Dockerfile
Pushes all images to OVH Container Registry
Runs migrations using rails target
Applies Kubernetes manifests
Kubernetes performs rolling update

Benefits:

✅ Fully automated
✅ Same Dockerfile for all environments
✅ Images built once, used everywhere
✅ Assets served from container
✅ Faster builds (layer caching)
✅ No manual SSH required

Deprecated Components


Component
Status
Replacement


storecove-app-docker/production/Dockerfile
Deprecated
datajust/Dockerfile with multiple targets


storecove-app-docker/production/build-deploy
Deprecated
.github/workflows/deploy.yml


.github/workflows/cd.yml GHCR images
Deprecated for prod
.github/workflows/deploy.yml OVH images


Manual ECS service restart
Deprecated
Automatic Kubernetes rolling update


S3 CDN for assets
Deprecated
Assets served from container


Container-level cron (whenever gem)
Deprecated
Kubernetes CronJobs


Architecture


      graph TB
    subgraph "OVH Kubernetes Cluster"
        subgraph "Web Tier"
            WEB1[Rails Server Pod 1]
            WEB2[Rails Server Pod 2]
            WEB3[Rails Server Pod N]
        end
        
        subgraph "Worker Tier"
            DJ1[Worker Primary Pod 1]
            DJ2[Worker Primary Pod N]
            DJ3[Worker Secondary Pod 1]
            DJ4[Worker Secondary Pod N]
        end
        
        subgraph "Kafka Consumer Tier"
            KS[Sending Status Consumer]
            KN[New Document Consumer]
            KR[Received Status Consumer]
        end
        
        subgraph "Scheduled Tasks"
            CRON[Kubernetes CronJobs]
        end
        
        subgraph "Logging"
            FB[Fluent Bit DaemonSet]
        end
        
        ING[Ingress Controller]
        SVC[Kubernetes Service]
    end
    
    DB[(Database)]
    KAFKA[Kafka Brokers]
    LOGZ[Logz.io]
    ROLLBAR[Rollbar]
    
    ING --> SVC
    SVC --> WEB1
    SVC --> WEB2
    SVC --> WEB3
    
    WEB1 --> DB
    DJ1 --> DB
    DJ2 --> DB
    DJ3 --> DB
    DJ4 --> DB
    CRON --> DB
    
    KS --> KAFKA
    KN --> KAFKA
    KR --> KAFKA
    
    FB --> LOGZ
    
    WEB1 -.-> FB
    DJ1 -.-> FB
    KS -.-> FB

    
      Loading

  
Build and Deploy Flow


      flowchart TD
    subgraph "Docker Build"
        BASE[Base Stage] --> APP[App Base Stage]
        APP --> RAILS[rails target]
        APP --> WORKER[worker target]
        APP --> KS[kafka-sending-status target]
        APP --> KN[kafka-new-document target]
        APP --> KR[kafka-received-status target]
    end
    
    subgraph "Container Registry"
        RAILS --> IMG_RAILS[storecove-app:rails-latest]
        WORKER --> IMG_WORKER[storecove-app:worker-latest]
        KS --> IMG_KS[storecove-app:kafka-sending-status-latest]
        KN --> IMG_KN[storecove-app:kafka-new-document-latest]
        KR --> IMG_KR[storecove-app:kafka-received-status-latest]
    end
    
    subgraph "Kubernetes Deployments"
        IMG_RAILS --> DEP_RAILS[rails-server Deployment]
        IMG_WORKER --> DEP_WORKER1[worker-primary Deployment]
        IMG_WORKER --> DEP_WORKER2[worker-secondary Deployment]
        IMG_KS --> DEP_KS[kafka-sending-status Deployment]
        IMG_KN --> DEP_KN[kafka-new-document Deployment]
        IMG_KR --> DEP_KR[kafka-received-status Deployment]
    end
    
    subgraph "Kubernetes CronJobs"
        IMG_RAILS --> CRON1[scheduled-task-1 CronJob]
        IMG_RAILS --> CRON2[scheduled-task-2 CronJob]
        IMG_RAILS --> CRONN[scheduled-task-N CronJob]
    end

    
      Loading

  
Components and Interfaces

1. Dockerfile Multi-Stage Build Targets

The Dockerfile uses multi-stage builds to create optimized images for each component type from a shared base.
# syntax=docker/dockerfile:1-labs
# Base stage with all dependencies
FROM ubuntu:focal AS base

ARG BUNDLER_VERSION=2.6.8
ENV BUNDLER_VERSION=${BUNDLER_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ENV BUNDLE_PATH=/cache/bundle
ENV YARN_CACHE_FOLDER=/cache/yarn
ENV BUNDLE_SILENCE_ROOT_WARNING=1

SHELL ["/bin/bash", "-l", "-c"]

# ... (existing base setup: apt packages, RVM, Ruby, Node.js, etc.) ...

WORKDIR /app

# Ruby dependencies stage
FROM base AS ruby-deps
USER app
COPY --chown=app:sudo Gemfile Gemfile.lock ./
RUN bash -lc "bundle install"

# Node dependencies stage
FROM base AS node-deps
USER app
COPY --chown=app:sudo package.json yarn.lock ./
RUN bash -lc "yarn install --frozen-lockfile"

# Application base with all code and assets
FROM base AS app-base
USER app

COPY --chown=app:sudo Gemfile Gemfile.lock ./
COPY --from=ruby-deps /cache/bundle /cache/bundle

COPY --chown=app:sudo package.json yarn.lock ./
COPY --from=node-deps /cache/yarn /cache/yarn

COPY --chown=app:sudo . .

RUN yarn install --frozen-lockfile
ENV SECRET_KEY_BASE_DUMMY=1
RUN bash -lc "bundle exec rails assets:precompile"
ENV SECRET_KEY_BASE_DUMMY=0

# Create log directory
RUN mkdir -p /app/log

# ===== Rails Server Target =====
FROM app-base AS rails
EXPOSE 3000
ENV RAILS_SERVE_STATIC_FILES=true
ENV RAILS_LOG_TO_STDOUT=true
ENV PROCESS_TARGET=server
CMD ["bash", "-lc", "bundle exec rails server -b 0.0.0.0 -p 3000"]

# ===== Delayed Job Worker Target =====
# Pool configuration passed via DELAYED_JOB_POOLS environment variable
# Example: DELAYED_JOB_POOLS="--pool=mail:1 --pool=slack:2"
FROM app-base AS worker
EXPOSE 3001
ENV PROCESS_TARGET=worker
ENV DELAYED_JOB_POOLS=""
ENV DELAYED_JOB_TIMEOUT=280
COPY --chown=app:sudo scripts/health_server.rb /scripts/health_server.rb
CMD ["bash", "-lc", "HEALTH_PORT=3001 ruby /scripts/health_server.rb & exec bundle exec bin/delayed_job run --timeout=${DELAYED_JOB_TIMEOUT} $DELAYED_JOB_POOLS"]

# ===== Kafka Sending Status Consumer =====
FROM app-base AS kafka-sending-status
EXPOSE 3002
ENV PROCESS_TARGET=kafka-sending-status
COPY --chown=app:sudo scripts/health_server.rb /scripts/health_server.rb
CMD ["bash", "-lc", "HEALTH_PORT=3002 ruby /scripts/health_server.rb & exec bundle exec racecar --group-id \"$KAFKA_SENDINGSTATUSUPDATE_CONSUMER_GROUP_ID\" --sasl-username \"$KAFKA_SENDINGSTATUSUPDATE_CONSUMER_USERNAME\" --sasl-password \"$KAFKA_SENDINGSTATUSUPDATE_CONSUMER_PASSWORD\" Kafka::Consumers::SendingActionStatusUpdateConsumer"]

# ===== Kafka New Document Consumer =====
FROM app-base AS kafka-new-document
EXPOSE 3003
ENV PROCESS_TARGET=kafka-new-document
COPY --chown=app:sudo scripts/health_server.rb /scripts/health_server.rb
CMD ["bash", "-lc", "HEALTH_PORT=3003 ruby /scripts/health_server.rb & exec bundle exec racecar --group-id \"$KAFKA_NEWDOCUMENTNOTIFICATION_CONSUMER_GROUP\" --sasl-username \"$KAFKA_NEWDOCUMENTNOTIFICATION_CONSUMER_USERNAME\" --sasl-password \"$KAFKA_NEWDOCUMENTNOTIFICATION_CONSUMER_PASSWORD\" Kafka::Consumers::NewDocumentNotificationConsumer"]

# ===== Kafka Received Status Consumer =====
FROM app-base AS kafka-received-status
EXPOSE 3004
ENV PROCESS_TARGET=kafka-received-status
COPY --chown=app:sudo scripts/health_server.rb /scripts/health_server.rb
CMD ["bash", "-lc", "HEALTH_PORT=3004 ruby /scripts/health_server.rb & exec bundle exec racecar --group-id \"$KAFKA_RECEIVEDDOCUMENTSTATUS_CONSUMER_GROUP\" --sasl-username \"$KAFKA_RECEIVEDDOCUMENTSTATUS_CONSUMER_USERNAME\" --sasl-password \"$KAFKA_RECEIVEDDOCUMENTSTATUS_CONSUMER_PASSWORD\" Kafka::Consumers::ReceivedDocumentStatusConsumer"]
Worker Pool Configuration

The worker Docker target uses a configurable DELAYED_JOB_POOLS environment variable, allowing different Kubernetes deployments to run different queue pools from the same image.
Pool Groups


Deployment
DELAYED_JOB_POOLS Value


worker-primary
--pool=mail:1 --pool=inboundpeppol,inboundpeppolemail,inboundsftp,inboundublemail,inboundpartneremail:4 --pool=ses_notifications,ses_mail,sar_mail,edi_smtp,edi_as2,ses_mail_in_out:2 --pool=vatcalc_out_out_live,vatcalc_out_out_pilot:1 --pool=analyze_action,invoice_analyzer,slack,apply_action:1 --pool=document_submissions:2


worker-secondary
--pool=smp_phoss:8 --pool=aruba_out_out_prod,aruba_out_out_pilot,aruba_out_out_webhooks_pilot,aruba_out_out_webhooks_prod:1 --pool=chargebee_webhook_events,exactsales_webhook_events,storecove_webhook_events:1 --pool=outgoing_webhooks,outgoing_webhooks_sandbox:4 --pool=outgoing_webhooks_asia,outgoing_webhooks_sandbox_asia:4 --pool=exact_worker,snelstart_worker,sftp_worker,as2_worker:1 --pool=received_documents,aruba_in_in_webhooks:1 --pool=storecove_api_self:3 --pool=active_storage_analysis,active_storage_mirror,active_storage_preview,active_storage_purge:1 --pool=kafka_sending_actions_status_update,kafka_received_document_status,kafka_new_document_notification:12 --pool=meta_events,exceptions,aruba_admin:1 --pool=customer_reporting:1 --pool=my_lhdnm_poller:6


This allows:

Independent scaling of pool groups
Single Docker image for all workers
Easy adjustment of pool assignments via K8s manifests

Puma Configuration

The Rails application uses Puma in single-process, multi-threaded mode (workers commented out in config/puma.rb). This is intentional for the Kubernetes deployment:

Horizontal Scaling: Multiple pods provide process-level isolation and fault tolerance
Simpler Failure Mode: If a pod crashes, only one replica is affected
Resource Predictability: Each pod uses consistent resources (no worker forking)
Thread Pool: Each pod uses 5 threads (configurable via RAILS_MAX_THREADS)

Production Configuration:
# config/puma.rb
threads 5, 5  # Default: 5 threads per pod
# workers disabled - scaling via Kubernetes replicas instead
Environment Variables:

RAILS_MAX_THREADS - Max threads per pod (default: 5)
RAILS_MIN_THREADS - Min threads per pod (default: 5)
WEB_CONCURRENCY - Not used (workers disabled)
DB_POOL - ActiveRecord connection pool size (should match RAILS_MAX_THREADS)

ActiveRecord Connection Pooling:
The ActiveRecord connection pool size should match the Puma thread count to avoid connection exhaustion. In config/database.yml:
production:
  primary:
    adapter: mysql2
    pool: <%= ENV.fetch("DB_POOL") { ENV.fetch("RAILS_MAX_THREADS") { 5 } } %>
    # ... other settings ...
For the Rails server with 5 threads per pod and 2-10 replicas, total connections = 5 threads × 10 pods = 50 connections maximum.
Racecar/Kafka Configuration

Racecar consumers must log to STDOUT for Fluent Bit collection:
# config/initializers/racecar.rb (update for Kubernetes)
Racecar.configure do |config|
  # ... existing config ...
  
  # Change from file logging to STDOUT
  config.logfile = STDOUT
  
  # Use Rails logger for consistent JSON formatting
  config.logger = Rails.logger if Rails.logger
  
  # Offset commit configuration for graceful shutdown
  config.offset_commit_interval = 10  # Commit every 10 seconds (default)
  config.offset_commit_threshold = 0  # Or commit after every message for max safety
  
  # ... rest of config ...
end
SIGTERM Handling: Racecar handles SIGTERM gracefully by default, committing offsets before shutdown.
Signal Handling and Process Management

CMD with bash -lc and exec:
The design uses CMD ["bash", "-lc", "exec bundle exec <command>"] which:

Starts bash as PID 1
The exec keyword replaces bash with the actual process
The actual process (Puma, delayed_job, racecar) receives SIGTERM directly
All three processes handle SIGTERM gracefully by default:

Puma: Stops accepting new connections, completes in-flight requests
delayed_job: Completes current job within timeout, or leaves in queue
Racecar: Commits offsets and disconnects cleanly


Health Server Background Process:
The health server runs as a background process (&) and won't receive SIGTERM propagation. This is acceptable because:

When the main process exits (delayed_job or racecar), the container exits
Kubernetes detects the container exit and restarts it
The health server is supplementary; container exit is the primary failure detection

Container Exit on Process Failure:
If the main process crashes:

Container exits with non-zero code
Kubernetes detects exit via container state
Liveness probe subsequently fails
Kubernetes restarts the container per the restart policy

2. Health Check Server (health_server.rb)

A lightweight WEBrick server for worker and Kafka consumer health checks:
#!/usr/bin/env ruby
require 'webrick'
require 'json'

PROCESS_TARGET = ENV.fetch('PROCESS_TARGET', 'unknown')
HEALTH_PORT = ENV.fetch('HEALTH_PORT', 3001).to_i

# Only load Rails for workers that need DB checks
if PROCESS_TARGET.start_with?('worker')
  require_relative '/app/config/environment'
end

server = WEBrick::HTTPServer.new(Port: HEALTH_PORT, Logger: WEBrick::Log.new("/dev/null"), AccessLog: [])

server.mount_proc '/health' do |req, res|
  begin
    if PROCESS_TARGET.start_with?('worker')
      # Workers check database connectivity
      ActiveRecord::Base.connection.execute("SELECT 1")
    end
    # Kafka consumers just check process is alive (per requirements)
    
    res.status = 200
    res.content_type = 'application/json'
    res.body = { 
      status: 'healthy', 
      process_target: PROCESS_TARGET, 
      pod_name: ENV.fetch('POD_NAME', 'unknown'),
      namespace: ENV.fetch('POD_NAMESPACE', 'default'),
      timestamp: Time.now.iso8601 
    }.to_json
  rescue => e
    res.status = 503
    res.content_type = 'application/json'
    res.body = { status: 'unhealthy', process_target: PROCESS_TARGET, error: e.message, timestamp: Time.now.iso8601 }.to_json
  end
end

server.mount_proc '/ready' do |req, res|
  res.status = 200
  res.content_type = 'application/json'
  res.body = { status: 'ready', process_target: PROCESS_TARGET }.to_json
end

trap('INT') { server.shutdown }
trap('TERM') { server.shutdown }

server.start
3. Rails Health Check Controller

For the web server, health checks are handled by a Rails controller. Note: Liveness checks process health only (no DB), while readiness checks DB connectivity.
# app/controllers/health_controller.rb
class HealthController < ApplicationController
  skip_before_action :authenticate_user!, raise: false
  
  # Liveness: Is the process alive? (Don't check DB - restarting won't help if DB is down)
  def liveness
    render json: { status: 'alive', process_target: ENV.fetch('PROCESS_TARGET', 'server'), timestamp: Time.current.iso8601 }, status: :ok
  end
  
  # Readiness: Can it serve traffic? (Check DB connectivity)
  def readiness
    ActiveRecord::Base.connection.execute("SELECT 1")
    render json: { 
      status: 'ready', 
      process_target: ENV.fetch('PROCESS_TARGET', 'server'), 
      pod_name: ENV.fetch('POD_NAME', 'unknown'),
      namespace: ENV.fetch('POD_NAMESPACE', 'default'),
      timestamp: Time.current.iso8601 
    }, status: :ok
  rescue => e
    render json: { status: 'not_ready', error: e.message }, status: :service_unavailable
  end
end
4. Rails Routes for Health Checks

# config/routes.rb (add these routes)
get '/health/liveness' => 'health#liveness'
get '/health/readiness' => 'health#readiness'
5. JSON Logging Configuration

# Gemfile (add)
gem 'lograge'
# config/environments/production.rb (add)
config.lograge.enabled = true
config.lograge.formatter = Lograge::Formatters::Json.new
config.lograge.custom_options = lambda do |event|
  {
    process_target: ENV.fetch('PROCESS_TARGET', 'server'),
    pod_name: ENV.fetch('POD_NAME', 'unknown'),
    namespace: ENV.fetch('POD_NAMESPACE', 'default')
  }
end
Data Models

Environment Variables


Variable
Required
Default
Description


PROCESS_TARGET
No
Set by target
Process type identifier (server, worker-primary, worker-secondary, kafka-*)


HEALTH_PORT
No
3001-3004
Port for health check server (workers/kafka)


DELAYED_JOB_POOLS
Conditional
""
Pool arguments for delayed_job (required for worker target)


DELAYED_JOB_TIMEOUT
No
280
Seconds to wait for job completion on SIGTERM


RAILS_ENV
Yes
-
Rails environment


RAILS_LOG_TO_STDOUT
No
true
Enable logging to stdout


RAILS_SERVE_STATIC_FILES
No
true
Enable static file serving from Puma


RAILS_MAX_THREADS
No
5
Maximum Puma threads per pod


RAILS_MIN_THREADS
No
5
Minimum Puma threads per pod


DB_POOL
No
5
ActiveRecord connection pool size (should match RAILS_MAX_THREADS)


DATABASE_URL
Yes
-
Database connection string


KAFKA_*
Conditional
-
Kafka credentials (required for kafka-* targets)


LOGZIO_TOKEN
Yes
-
Logz.io shipping token (via Secret)


POD_NAME
No
unknown
Kubernetes pod name (from downward API)


POD_NAMESPACE
No
default
Kubernetes namespace (from downward API)


Kubernetes Secrets


Secret Name
Keys
Used By


storecove-app-db-credentials
DATABASE_HOST, DATABASE_PORT, DATABASE_USERNAME, DATABASE_PASSWORD, DATABASE_NAME
All


storecove-app-master-key
RAILS_MASTER_KEY
All


storecove-app-aws-credentials
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, AWS_*_BUCKET
All


storecove-app-valkey-credentials
VALKEY_HOST, VALKEY_PORT, VALKEY_USERNAME, VALKEY_PASSWORD
Rails, Workers


storecove-app-queue-credentials
SQS_*_QUEUE URLs (bounces, complaints, deliveries, partner, peppol, receive, sftp)
Workers


storecove-app-kafka-credentials
KAFKA_CONSUMER, KAFKA_PRODUCER
Kafka consumers


storecove-app-logzio
LOGZIO_TOKEN
Fluent Bit


storecove-app-rollbar
ROLLBAR_ACCESS_TOKEN
GitHub Actions


storecove-app-email-credentials
EMAIL_PROVIDER_USERNAME, EMAIL_PROVIDER_PASSWORD
Rails, Workers


storecove-app-billing-credentials
CHARGEBEE_API_KEY, CHARGEBEE_SITE, STRIPE_SECRET_KEY
Rails, Workers


storecove-app-peppol-credentials
PEPPOL_SHOP_ID, DEFAULT_ACCESSPOINT_*
Rails, Workers


storecove-app-webhooks-credentials
WEBHOOKS_ENCRYPT_KEY, WEBHOOKS_ENCRYPT_IV
Rails, Workers


storecove-app-intercom-credentials
INTERCOM_APP_ID, INTERCOM_API_SECRET, INTERCOM_API_ACCESS_TOKEN
Rails


mysql-ca-cert
ca-cert.pem
All (mounted as volume)


Health Check Ports


Component
Health Port
Endpoint
Notes


Rails Server
3000
/health/liveness, /health/readiness
Via Rails controller


Worker Primary
3001
/health
Via WEBrick health_server.rb


Worker Secondary
3001
/health
Via WEBrick health_server.rb


Kafka Sending Status
3002
/health
Via WEBrick health_server.rb


Kafka New Document
3003
/health
Via WEBrick health_server.rb


Kafka Received Status
3004
/health
Via WEBrick health_server.rb


Kubernetes Resource Specifications


Component
CPU Request
CPU Limit
Memory Request
Memory Limit
Replicas
terminationGracePeriodSeconds


Rails Server
500m
2000m
1Gi
4Gi
2-10 (HPA)
30


Worker Primary
250m
1000m
512Mi
2Gi
2-5
300


Worker Secondary
250m
2000m
512Mi
4Gi
2-5
300


Kafka Consumer (each)
100m
500m
256Mi
1Gi
1-3
60


CronJob (each)
100m
500m
256Mi
1Gi
N/A
N/A


Build and Push Strategy (GitHub Actions)

Each Docker build target is built and pushed separately with appropriate tags:
# Build Rails server target
- name: Build and push Rails server
  uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    target: rails
    tags: |
      ${{ vars.OVH_REGISTRY_URL }}/storecove-app:rails-${{ github.sha }}
      ${{ vars.OVH_REGISTRY_URL }}/storecove-app:rails-latest
    cache-from: type=gha
    cache-to: type=gha,mode=max

# Build Worker target
- name: Build and push Worker
  uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    target: worker
    tags: |
      ${{ vars.OVH_REGISTRY_URL }}/storecove-app:worker-${{ github.sha }}
      ${{ vars.OVH_REGISTRY_URL }}/storecove-app:worker-latest
    cache-from: type=gha
    cache-to: type=gha,mode=max

# Build Kafka consumer targets
- name: Build and push Kafka Sending Status
  uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    target: kafka-sending-status
    tags: |
      ${{ vars.OVH_REGISTRY_URL }}/storecove-app:kafka-sending-status-${{ github.sha }}
      ${{ vars.OVH_REGISTRY_URL }}/storecove-app:kafka-sending-status-latest
    cache-from: type=gha
    cache-to: type=gha,mode=max

# Repeat for kafka-new-document and kafka-received-status
Docker BuildKit Optimization

The build strategy uses Docker BuildKit with GitHub Actions cache for faster builds:
Cache Strategy:

cache-from: type=gha - Pull cache layers from previous builds
cache-to: type=gha,mode=max - Store all layers for future builds
Shared layers between targets (base, ruby-deps, node-deps, app-base) are cached once

Build Performance:

First build: ~15-20 minutes (all layers)
Subsequent builds (code changes only): ~2-5 minutes (app-base rebuilt)
Subsequent builds (dependency changes): ~10-12 minutes (ruby-deps/node-deps rebuilt)

Parallel Builds:
Consider building targets in parallel using GitHub Actions matrix strategy:
strategy:
  matrix:
    target: [rails, worker, kafka-sending-status, kafka-new-document, kafka-received-status]
steps:
  - uses: docker/build-push-action@v6
    with:
      target: ${{ matrix.target }}
      tags: ${{ vars.OVH_REGISTRY_URL }}/storecove-app:${{ matrix.target }}-${{ github.sha }}
This reduces total build time from ~15 minutes sequential to ~5 minutes parallel (limited by slowest target).
# Run database migrations
- name: Run database migrations
  run: |
    kubectl run migration-${{ github.sha }} \
      --image=${{ vars.OVH_REGISTRY_URL }}/storecove-app:rails-${{ github.sha }} \
      --restart=Never \
      --rm \
      --wait \
      --command -- bash -lc "bundle exec rails db:migrate"

# Apply Kubernetes manifests
- name: Apply Kubernetes manifests
  run: |
    export IMAGE_TAG=${{ github.sha }}
    envsubst < k8s/rails-server.yaml | kubectl apply -f -
    envsubst < k8s/worker-primary.yaml | kubectl apply -f -
    envsubst < k8s/worker-secondary.yaml | kubectl apply -f -
    envsubst < k8s/kafka-sending-status.yaml | kubectl apply -f -
    envsubst < k8s/kafka-new-document.yaml | kubectl apply -f -
    envsubst < k8s/kafka-received-status.yaml | kubectl apply -f -
    kubectl apply -f k8s/cronjobs/
    kubectl apply -f k8s/ingress.yaml

# Notify Rollbar of deployment
- name: Notify Rollbar
  if: success()
  run: |
    curl -X POST https://api.rollbar.com/api/1/deploy/ \
      -H "Content-Type: application/json" \
      -d '{
        "access_token": "${{ secrets.ROLLBAR_ACCESS_TOKEN }}",
        "environment": "production",
        "revision": "${{ github.sha }}",
        "local_username": "${{ github.actor }}",
        "comment": "Deployed via GitHub Actions"
      }'
Kubernetes Manifests

Rails Server Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rails-server
  labels:
    app: storecove
    component: server
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: storecove
      component: server
  template:
    metadata:
      labels:
        app: storecove
        component: server
    spec:
      terminationGracePeriodSeconds: 30
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
      containers:
      - name: rails
        image: ${OVH_REGISTRY_URL}/storecove-app:rails-latest
        imagePullPolicy: Always
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: false
          capabilities:
            drop:
              - ALL
        env:
        - name: RAILS_ENV
          value: "production"
        - name: RAILS_SERVE_STATIC_FILES
          value: "true"
        - name: PROCESS_TARGET
          value: "server"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MYSQL_SSL_CA
          value: "/etc/ssl/mysql/ca-cert.pem"
        envFrom:
        - secretRef:
            name: storecove-app-db-credentials
        - secretRef:
            name: storecove-app-master-key
        - secretRef:
            name: storecove-app-aws-credentials
        - secretRef:
            name: storecove-app-valkey-credentials
        - secretRef:
            name: storecove-app-email-credentials
        - secretRef:
            name: storecove-app-billing-credentials
        - secretRef:
            name: storecove-app-peppol-credentials
        - secretRef:
            name: storecove-app-webhooks-credentials
        - secretRef:
            name: storecove-app-intercom-credentials
        - secretRef:
            name: storecove-app-rollbar-credentials
        volumeMounts:
        - name: mysql-ca
          mountPath: /etc/ssl/mysql
          readOnly: true
        ports:
        - containerPort: 3000
          name: http
        livenessProbe:
          httpGet:
            path: /health/liveness
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
          timeoutSeconds: 5
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
      volumes:
      - name: mysql-ca
        secret:
          secretName: mysql-ca-cert
---
apiVersion: v1
kind: Service
metadata:
  name: rails-server
spec:
  selector:
    app: storecove
    component: server
  ports:
  - port: 80
    targetPort: 3000
Ingress Configuration


Important: The Ingress NGINX Controller is scheduled for retirement in March 2026.

Verify OVH's actual ingress controller type before production deployment
If OVH uses nginx-ingress, plan migration to Gateway API by Q2 2026
The annotations below assume nginx-ingress; update if OVH uses a different controller


OVH Production Subdomains:

app.fr.storecove.com - Main application (2M body size limit)
api.fr.storecove.com - API endpoint (100M body size limit)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: storecove-app-ingress
  annotations:
    # Body size limits
    nginx.ingress.kubernetes.io/proxy-body-size: "2m"
    # Timeouts for long-running requests
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
    # Security headers
    nginx.ingress.kubernetes.io/server-snippet: |
      more_clear_headers "X-Powered-By";
      more_clear_headers "Server";
    # TLS
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.fr.storecove.com
    secretName: storecove-app-tls
  rules:
  - host: app.fr.storecove.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rails-server
            port:
              number: 80
---
# Separate ingress for API subdomain with larger body size
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: storecove-api-ingress
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.fr.storecove.com
    secretName: storecove-api-tls
  rules:
  - host: api.fr.storecove.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rails-server
            port:
              number: 80
Worker Primary Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker-primary
  labels:
    app: storecove
    component: worker-primary
spec:
  replicas: 2
  selector:
    matchLabels:
      app: storecove
      component: worker-primary
  template:
    metadata:
      labels:
        app: storecove
        component: worker-primary
    spec:
      terminationGracePeriodSeconds: 300
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
      containers:
      - name: worker
        image: ${OVH_REGISTRY_URL}/storecove-app:worker-latest
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: false
          capabilities:
            drop:
              - ALL
        env:
        - name: RAILS_ENV
          value: "production"
        - name: PROCESS_TARGET
          value: "worker-primary"
        - name: DELAYED_JOB_POOLS
          value: "--pool=mail:1 --pool=inboundpeppol,inboundpeppolemail,inboundsftp,inboundublemail,inboundpartneremail:4 --pool=ses_notifications,ses_mail,sar_mail,edi_smtp,edi_as2,ses_mail_in_out:2 --pool=vatcalc_out_out_live,vatcalc_out_out_pilot:1 --pool=analyze_action,invoice_analyzer,slack,apply_action:1 --pool=document_submissions:2"
        - name: DELAYED_JOB_TIMEOUT
          value: "280"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MYSQL_SSL_CA
          value: "/etc/ssl/mysql/ca-cert.pem"
        envFrom:
        - secretRef:
            name: storecove-app-db-credentials
        - secretRef:
            name: storecove-app-master-key
        - secretRef:
            name: storecove-app-aws-credentials
        - secretRef:
            name: storecove-app-valkey-credentials
        - secretRef:
            name: storecove-app-queue-credentials
        - secretRef:
            name: storecove-app-email-credentials
        - secretRef:
            name: storecove-app-billing-credentials
        - secretRef:
            name: storecove-app-peppol-credentials
        - secretRef:
            name: storecove-app-webhooks-credentials
        volumeMounts:
        - name: mysql-ca
          mountPath: /etc/ssl/mysql
          readOnly: true
        ports:
        - containerPort: 3001
          name: health
        livenessProbe:
          httpGet:
            path: /health
            port: 3001
          initialDelaySeconds: 30
          periodSeconds: 30
          failureThreshold: 3
          timeoutSeconds: 10
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 2Gi
      volumes:
      - name: mysql-ca
        secret:
          secretName: mysql-ca-cert
Worker Secondary Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker-secondary
  labels:
    app: storecove
    component: worker-secondary
spec:
  replicas: 2
  selector:
    matchLabels:
      app: storecove
      component: worker-secondary
  template:
    metadata:
      labels:
        app: storecove
        component: worker-secondary
    spec:
      terminationGracePeriodSeconds: 300
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
      containers:
      - name: worker
        image: ${OVH_REGISTRY_URL}/storecove-app:worker-latest
        imagePullPolicy: Always
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: false
          capabilities:
            drop:
              - ALL
        env:
        - name: RAILS_ENV
          value: "production"
        - name: PROCESS_TARGET
          value: "worker-secondary"
        - name: DELAYED_JOB_POOLS
          value: "--pool=smp_phoss:8 --pool=aruba_out_out_prod,aruba_out_out_pilot,aruba_out_out_webhooks_pilot,aruba_out_out_webhooks_prod:1 --pool=chargebee_webhook_events,exactsales_webhook_events,storecove_webhook_events:1 --pool=outgoing_webhooks,outgoing_webhooks_sandbox:4 --pool=outgoing_webhooks_asia,outgoing_webhooks_sandbox_asia:4 --pool=exact_worker,snelstart_worker,sftp_worker,as2_worker:1 --pool=received_documents,aruba_in_in_webhooks:1 --pool=storecove_api_self:3 --pool=active_storage_analysis,active_storage_mirror,active_storage_preview,active_storage_purge:1 --pool=kafka_sending_actions_status_update,kafka_received_document_status,kafka_new_document_notification:12 --pool=meta_events,exceptions,aruba_admin:1 --pool=customer_reporting:1 --pool=my_lhdnm_poller:6"
        - name: DELAYED_JOB_TIMEOUT
          value: "280"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MYSQL_SSL_CA
          value: "/etc/ssl/mysql/ca-cert.pem"
        envFrom:
        - secretRef:
            name: storecove-app-db-credentials
        - secretRef:
            name: storecove-app-master-key
        - secretRef:
            name: storecove-app-aws-credentials
        - secretRef:
            name: storecove-app-valkey-credentials
        - secretRef:
            name: storecove-app-queue-credentials
        - secretRef:
            name: storecove-app-email-credentials
        - secretRef:
            name: storecove-app-billing-credentials
        - secretRef:
            name: storecove-app-peppol-credentials
        - secretRef:
            name: storecove-app-webhooks-credentials
        volumeMounts:
        - name: mysql-ca
          mountPath: /etc/ssl/mysql
          readOnly: true
        ports:
        - containerPort: 3001
          name: health
        livenessProbe:
          httpGet:
            path: /health
            port: 3001
          initialDelaySeconds: 30
          periodSeconds: 30
          failureThreshold: 3
          timeoutSeconds: 10
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 2000m
            memory: 4Gi
      volumes:
      - name: mysql-ca
        secret:
          secretName: mysql-ca-cert
Kafka Consumer Deployment (Example: Sending Status)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-sending-status
  labels:
    app: storecove
    component: kafka-sending-status
spec:
  replicas: 1
  selector:
    matchLabels:
      app: storecove
      component: kafka-sending-status
  template:
    metadata:
      labels:
        app: storecove
        component: kafka-sending-status
    spec:
      terminationGracePeriodSeconds: 60
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
      containers:
      - name: consumer
        image: ${OVH_REGISTRY_URL}/storecove-app:kafka-sending-status-latest
        imagePullPolicy: Always
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: false
          capabilities:
            drop:
              - ALL
        env:
        - name: RAILS_ENV
          value: "production"
        - name: PROCESS_TARGET
          value: "kafka-sending-status"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MYSQL_SSL_CA
          value: "/etc/ssl/mysql/ca-cert.pem"
        envFrom:
        - secretRef:
            name: storecove-app-db-credentials
        - secretRef:
            name: storecove-app-master-key
        - secretRef:
            name: storecove-app-kafka-credentials
        volumeMounts:
        - name: mysql-ca
          mountPath: /etc/ssl/mysql
          readOnly: true
        ports:
        - containerPort: 3002
          name: health
        livenessProbe:
          httpGet:
            path: /health
            port: 3002
          initialDelaySeconds: 30
          periodSeconds: 30
          failureThreshold: 3
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 1Gi
      volumes:
      - name: mysql-ca
        secret:
          secretName: mysql-ca-cert

Note: The kafka-new-document and kafka-received-status deployments follow the same pattern, with their respective ports (3003, 3004) and PROCESS_TARGET values.

Kubernetes CronJobs

Scheduled tasks are implemented as Kubernetes CronJobs using the rails Docker target. Each CronJob runs independently with the full Rails environment.
# Example: Daily reporting task
apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-report
  labels:
    app: storecove
    component: cronjob
spec:
  schedule: "0 6 * * *"  # 6 AM UTC daily
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      activeDeadlineSeconds: 3600  # Job must complete within 1 hour
      backoffLimit: 2
      template:
        spec:
          restartPolicy: OnFailure
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            runAsGroup: 1000
            fsGroup: 1000
          containers:
          - name: rails
            image: ${OVH_REGISTRY_URL}/storecove-app:rails-latest
            imagePullPolicy: Always
            securityContext:
              allowPrivilegeEscalation: false
              readOnlyRootFilesystem: false
              capabilities:
                drop:
                  - ALL
            command: ["bash", "-lc", "bundle exec rake reports:daily"]
            env:
            - name: RAILS_ENV
              value: "production"
            - name: PROCESS_TARGET
              value: "cronjob-daily-report"
            - name: MYSQL_SSL_CA
              value: "/etc/ssl/mysql/ca-cert.pem"
            envFrom:
            - secretRef:
                name: storecove-app-db-credentials
            - secretRef:
                name: storecove-app-master-key
            - secretRef:
                name: storecove-app-aws-credentials
            - secretRef:
                name: storecove-app-valkey-credentials
            volumeMounts:
            - name: mysql-ca
              mountPath: /etc/ssl/mysql
              readOnly: true
            resources:
              requests:
                cpu: 100m
                memory: 256Mi
              limits:
                cpu: 500m
                memory: 1Gi
          volumes:
          - name: mysql-ca
            secret:
              secretName: mysql-ca-cert
CronJob Migration from Whenever

Tasks currently defined in config/schedule.rb (whenever gem) must be migrated to individual CronJob manifests:


Task Description
CronJob Name
Schedule
Command


Customer reports
customer-reports
0 6 * * *
rake customer_reporting:schedule_reports


SaaS org reporting (monthly)
saas-organizations
30 8 1 * *
rake saas:organizations_global && rake saas:organizations_asia && rake saas:organizations_pacific


Peppol end users reporting
peppol-end-users
0 23 2 * *
rake peppol_reporting:peppol_reporting_end_users


Peppol transactions reporting
peppol-transactions
0 1 3 * *
rake peppol_reporting:peppol_reporting_transactions


Peppol SG/IRAS reporting
peppol-sg-monthly
30 5 1 * *
rake peppol_reporting:identifiers_in_out_sg && rake peppol_reporting:reporting_sg_iras_sla_sandbox && rake peppol_reporting:reporting_sg_iras_sla_live


AWS SES bounce rates
aws-ses-bounce-rates
30 4 * * 1
rake aws_ses_reporting:bounce_rates_sending && rake aws_ses_reporting:bounce_rates_administrations


Kafka sending/clearing updates
kafka-sending-clearing
*/10 * * * *
rake kafka:produce_invoice_submission_action_update_requests_sending && rake kafka:produce_invoice_submission_action_update_requests_clearing


Kafka new docs hourly
kafka-new-docs-hourly
0 * * * *
rake kafka:produce_new_documents_request_hourly


Kafka new docs daily
kafka-new-docs-daily
0 0 * * *
rake kafka:produce_new_documents_request_daily


Clean delayed jobs queue
clean-delayed-jobs
*/5 * * * *
rake railsdb:clean_delayed_jobs_inboundpeppol


CorpPass/MyKYC detection
corppass-mykyc-detect
*/5 * * * *
rake corppass:detect[sandbox] && rake corppass:detect[live] && rake mykyc:detect[sandbox] && rake mykyc:detect[live]


Reconcile Chargebee
reconcile-chargebee
15 7 * * 6
rake saas:reconcile_chargebee


Check invalid identifiers
identifiers-invalid
30 7 * * 6
rake identifiers:invalid


SMP reconciliation
smp-reconcile
0 8 * * 6
rake smp:reconcile && rake smp:reconcile_sg


Email worker
email-worker
0 * * * *
rails runner "C5::EmailWorker.new.perform"


Invoice analyzer
invoice-analyzer
0 * * * *
rails runner "InvoiceAnalyzerJob.perform_later"


Total: 16 CronJobs replacing container-level cron managed by whenever gem.
Fluent Bit RBAC and DaemonSet for Logz.io

Fluent Bit requires RBAC permissions to access Kubernetes metadata for log enrichment.
# ServiceAccount for Fluent Bit
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
---
# ClusterRole with permissions to read pod metadata
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit
rules:
- apiGroups: [""]
  resources:
    - namespaces
    - pods
  verbs: ["get", "list", "watch"]
---
# ClusterRoleBinding to bind the role to the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit
subjects:
- kind: ServiceAccount
  name: fluent-bit
  namespace: logging
---
# DaemonSet for Fluent Bit
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      serviceAccountName: fluent-bit
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:latest
        securityContext:
          runAsNonRoot: false
          privileged: false
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
              - ALL
        env:
        - name: LOGZIO_TOKEN
          valueFrom:
            secretKeyRef:
              name: storecove-app-logzio
              key: token
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config
          mountPath: /fluent-bit/etc/
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: config
        configMap:
          name: fluent-bit-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Parsers_File  parsers.conf

    [INPUT]
        Name              tail
        Path              /var/log/containers/storecove*.log
        Parser            docker
        Tag               kube.*
        Refresh_Interval  5
        Mem_Buf_Limit     5MB

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        K8S-Logging.Parser  On

    [OUTPUT]
        Name            http
        Match           *
        Host            listener.logz.io
        Port            8071
        URI             /?token=${LOGZIO_TOKEN}&type=kubernetes
        Format          json_lines
        tls             On
        tls.verify      On
  
  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system—essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Build Target Produces Correct Process

For any Docker build target (rails, worker, kafka-*), building and running the image SHALL start only the process specified by that target.
Validates: Requirements 1.2, 1.3, 1.4, 1.5, 1.6
Property 2: Foreground Process Execution

For any Docker build target, the started process SHALL be the main process (PID 1 or direct child of PID 1) and SHALL not daemonize.
Validates: Requirements 1.7
Property 3: Health Server Starts Before Main Process

For any worker or Kafka consumer target, the health check server SHALL be listening on its port before the main process starts consuming work.
Validates: Requirements 1.9
Property 4: JSON Log Format Validity

For any log entry output by any process type (server, worker, kafka-*), the log entry SHALL be valid JSON that can be parsed without error.
Validates: Requirements 2.1, 2.2, 2.3
Property 5: Required Log Fields Presence

For any JSON log entry, the entry SHALL contain the fields: timestamp, level, process_target, pod_name, namespace, and message.
Validates: Requirements 2.4, 2.5
Property 6: Sensitive Data Exclusion from Logs

For any log entry, the entry SHALL NOT contain values of environment variables whose names contain PASSWORD, SECRET, KEY, or TOKEN.
Validates: Requirements 2.8, 7.5
Property 7: SIGTERM Graceful Shutdown

For any SIGTERM signal sent to a container, the main process SHALL begin graceful shutdown within 1 second.
Validates: Requirements 5.4
Property 8: Health Check Response Time

For any health check request, the response SHALL be returned within 5 seconds.
Validates: Requirements 3.11
Property 9: Liveness vs Readiness Separation

For any Rails server, the liveness endpoint SHALL return 200 even when the database is unreachable, while the readiness endpoint SHALL return 503.
Validates: Requirements 3.1, 3.2, 3.3
Error Handling

Build Target Errors


Error Condition
Behavior
Exit Code


Missing required env var
Log error with variable name, exit
1


Process fails to start
Log error with details, exit
1


Health server fails to bind
Log error, continue (main process may still work)
-


Health Check Error Responses


Component
Error Condition
HTTP Status
Response Body


Rails Server (readiness)
DB connection failed
503
{"status":"not_ready","error":"..."}


Worker
DB connection failed
503
{"status":"unhealthy","error":"..."}


Worker
Process not running
503
{"status":"unhealthy","error":"process not found"}


Kafka Consumer
Process crashed
503
{"status":"unhealthy","error":"..."}


Graceful Shutdown Timeouts


Component
terminationGracePeriodSeconds
Rationale


Rails Server
30
Typical HTTP request timeout


Delayed Job Worker
300
Jobs may take several minutes


Kafka Consumer
60
Offset commit and disconnect


Testing Strategy

Unit Tests

Unit tests verify specific examples and edge cases:


Health Check Tests

Test liveness returns 200 when process running
Test readiness returns 200 when DB connected
Test readiness returns 503 when DB disconnected


Logging Tests

Test log output is valid JSON
Test required fields are present
Test sensitive data is not logged


Property-Based Tests

Property-based tests verify universal properties across many inputs using a PBT library (e.g., RSpec property testing for Ruby).
Each property test should run minimum 100 iterations.
Property Test 1: JSON Log Validity

Generate various log scenarios
Verify all output is parseable JSON
Feature: kubernetes-rails-deployment, Property 4: JSON Log Format Validity

Property Test 2: Required Fields Presence

Generate log entries
Verify all contain required fields
Feature: kubernetes-rails-deployment, Property 5: Required Log Fields Presence

Property Test 3: Sensitive Data Exclusion

Generate log entries with various env vars set
Verify no sensitive values appear in logs
Feature: kubernetes-rails-deployment, Property 6: Sensitive Data Exclusion from Logs

Integration Tests

Integration tests verify components work together:


Container Build Tests

Build each target
Verify correct process starts
Verify health endpoint responds


Kubernetes Manifest Validation

Use kubectl --dry-run to validate manifests
Verify all required fields present
Verify probe configurations correct


Log Shipping Tests

Start container with Fluent Bit
Generate logs
Verify logs appear in Logz.io (or mock endpoint)


## requirements.md

      
    Raw
  

              requirements.md
            
          
    Requirements Document

Introduction

This specification covers the modernization of a Rails application deployment from ECS with a monolithic container approach to a Kubernetes-native architecture. Currently, the application runs delayed_job workers and multiple Racecar Kafka consumers as daemonized background processes alongside the web server. The new architecture will run each process type in its own container with proper health monitoring, centralized logging to Logz.io, and automatic recovery via Kubernetes probes.
Historical Context: The previous deployment model used AWS ECS with a disconnected build process. When code was merged to master, a CD workflow (.github/workflows/cd.yml) built and pushed images to GitHub Container Registry, but these images were never used in production. Instead, production deployments required manually SSH-ing to a build server, which would clone the datajust repository and build a fresh image using a separate Dockerfile located in the storecove-app-docker repository (production/Dockerfile). This image was then pushed to AWS ECR and deployed to ECS.
New Approach: The OVH Kubernetes deployment eliminates this disconnection. Images are built automatically in GitHub Actions when code is merged to master, using the Dockerfile in the datajust repository itself. These same images are immediately deployed to OVH Kubernetes, eliminating manual steps and ensuring consistency between CI and production. The storecove-app-docker repository is deprecated for this deployment model.
Glossary


Docker_Build_Target: A named stage in a multi-stage Dockerfile that produces a specific container image
Rails_Server: Puma serving the Rails web application
Delayed_Jobs_Worker: Background job processor using delayed_job with multiple queue pools
Kafka_Consumer: Racecar-based consumer (SendingActionStatusUpdate, NewDocumentNotification, ReceivedDocumentStatus)
Health_Check_Endpoint: HTTP endpoint returning process health status for Kubernetes probes
Logz_io: External centralized logging service for log aggregation
Kubernetes_Deployment: K8s resource defining pod specifications and replica counts
Liveness_Probe: Kubernetes health check that restarts unhealthy containers
Readiness_Probe: Kubernetes health check that controls traffic routing to the Rails_Server
CronJob: Kubernetes resource for running scheduled tasks

Requirements

Requirement 1: Docker Multi-Stage Build Targets

User Story: As a DevOps engineer, I want separate Docker build targets for each process type, so that I can deploy and scale each component independently using the same codebase.
Acceptance Criteria


THE Dockerfile SHALL define a base stage containing all shared dependencies and application code
THE Dockerfile SHALL define a build target named "rails" that starts Puma serving the Rails application in the foreground
THE Dockerfile SHALL define a build target named "worker" that starts delayed_job in the foreground using pools specified by the DELAYED_JOB_POOLS environment variable
THE worker target CMD SHALL expand the DELAYED_JOB_POOLS variable to pass pool arguments to delayed_job (e.g., "--pool=mail:1 --pool=slack:2")
THE Dockerfile SHALL define a build target named "kafka-sending-status" that starts the SendingActionStatusUpdateConsumer in the foreground
THE Dockerfile SHALL define a build target named "kafka-new-document" that starts the NewDocumentNotificationConsumer in the foreground
THE Dockerfile SHALL define a build target named "kafka-received-status" that starts the ReceivedDocumentStatusConsumer in the foreground
EACH build target SHALL run its process in the foreground without daemonizing
EACH build target SHALL trap SIGTERM for graceful shutdown
THE worker and Kafka consumer targets SHALL start a health check server on their respective health ports before starting the main process
THE rails build target SHALL expose port 3000 for the web server and health endpoints
THE worker target SHALL expose port 3001 for health checks
THE kafka-sending-status target SHALL expose port 3002 for health checks
THE kafka-new-document target SHALL expose port 3003 for health checks
THE kafka-received-status target SHALL expose port 3004 for health checks

Requirement 2: Centralized Logging to Logz.io

User Story: As an operations engineer, I want all application logs sent to Logz.io, so that I can monitor and debug issues across all components in one place.
Acceptance Criteria


THE Rails_Server SHALL output logs to stdout in JSON format
THE Delayed_Jobs_Worker SHALL output logs to stdout in JSON format
THE Kafka_Consumer SHALL output logs to stdout in JSON format
WHEN a log entry is generated, THE logging configuration SHALL include timestamp, log level, process_target, pod_name, namespace, and message fields
THE Kubernetes_Deployment SHALL set a PROCESS_TARGET environment variable identifying the component type, which Rails SHALL include in all log entries
THE Kubernetes_Deployment SHALL use a Fluent Bit DaemonSet to forward container stdout to Logz_io
THE Kubernetes_Deployment SHALL provide the Logz.io token via a Kubernetes Secret named storecove-app-logzio
NO Docker build target SHALL log any environment variables containing PASSWORD, SECRET, KEY, or TOKEN

Requirement 3: Health Check Endpoints

User Story: As a platform engineer, I want health check endpoints for each process type, so that Kubernetes can monitor and restart unhealthy containers.
Acceptance Criteria


WHEN the Rails_Server process is running, THE Health_Check_Endpoint at /health/liveness SHALL return HTTP 200 status
WHEN the Rails_Server is healthy AND can connect to the database, THE Health_Check_Endpoint at /health/readiness SHALL return HTTP 200 status
IF the Rails_Server cannot connect to the database, THEN THE Health_Check_Endpoint at /health/readiness SHALL return HTTP 503 status
WHEN the Delayed_Jobs_Worker process is running and can connect to the database, THE Health_Check_Endpoint SHALL return HTTP 200 status
IF the Delayed_Jobs_Worker process is not running or cannot connect to the database, THEN THE Health_Check_Endpoint SHALL return HTTP 503 status
WHEN the Kafka_Consumer process is running and has not crashed, THE Health_Check_Endpoint SHALL return HTTP 200 status
IF the Kafka_Consumer process has crashed or is not running, THEN THE Health_Check_Endpoint SHALL return HTTP 503 status
THE Health_Check_Endpoint for Delayed_Jobs_Worker SHALL listen on port 3001
THE Health_Check_Endpoint for Kafka consumers SHALL listen on ports 3002 (sending-status), 3003 (new-document), and 3004 (received-status)
THE Health_Check_Endpoint for workers and Kafka consumers SHALL be provided by a lightweight WEBrick HTTP server
THE Health_Check_Endpoint SHALL respond within 5 seconds

Requirement 4: Kubernetes Deployment Configuration

User Story: As a DevOps engineer, I want Kubernetes deployment manifests for each component, so that I can deploy and scale them independently on OVH Kubernetes.
Acceptance Criteria


THE Kubernetes_Deployment for Rails_Server SHALL define liveness probe at /health/liveness with initialDelaySeconds 30, periodSeconds 10, and failureThreshold 3
THE Kubernetes_Deployment for Rails_Server SHALL define readiness probe at /health/readiness with initialDelaySeconds 30, periodSeconds 10, and failureThreshold 3
THE Kubernetes_Deployment for Delayed_Jobs_Worker SHALL define liveness probe on port 3001 with initialDelaySeconds 30, periodSeconds 30, and failureThreshold 3
THE Kubernetes_Deployment for each Kafka_Consumer SHALL define liveness probe on its respective health port with initialDelaySeconds 30, periodSeconds 30, and failureThreshold 3
WHEN a Liveness_Probe fails three consecutive times, THE Kubernetes_Deployment SHALL restart the container
THE Kubernetes_Deployment SHALL allow independent replica scaling for each component type
THE Kubernetes_Deployment SHALL use images built from different Docker build targets (rails, worker, kafka-*) from the same Dockerfile
THE Kubernetes_Deployment for Rails_Server SHALL define resource requests of 256Mi memory / 250m CPU, and limits of 2Gi memory / 1000m CPU
THE Kubernetes_Deployment for Delayed_Jobs_Worker SHALL define resource requests of 512Mi memory / 250m CPU, and limits of 4Gi memory / 2000m CPU
THE Kubernetes_Deployment for each Kafka_Consumer SHALL define resource requests of 256Mi memory / 100m CPU, and limits of 1Gi memory / 500m CPU
THE Kubernetes_Deployment SHALL NOT define a Kubernetes Service for worker or Kafka deployments
THE Kubernetes_Deployment for Rails_Server SHALL define a strategy.rollingUpdate with maxUnavailable 0 and maxSurge 1
THE Kubernetes_Deployment SHALL provide Kafka credentials via a Kubernetes Secret named storecove-app-kafka-credentials
THE Kubernetes cluster SHALL define separate Deployment resources for different worker pool groups, each using the same "worker" Docker target with different DELAYED_JOB_POOLS values
THE worker-primary Deployment SHALL configure DELAYED_JOB_POOLS with: mail, inbound processing (peppol, sftp, ubl, partner email), SES/email queues, vatcalc, analyze/invoice/slack/apply actions, and document_submissions pools
THE worker-secondary Deployment SHALL configure DELAYED_JOB_POOLS with: smp_phoss, aruba, webhooks (including asia), integrations (exact, snelstart, sftp, as2), received_documents, storecove_api_self, active_storage, kafka processing, meta_events, customer_reporting, and my_lhdnm_poller pools
EACH worker Deployment SHALL use the same health check port (3001) since only one delayed_job process runs per container

Requirement 5: Graceful Shutdown Handling

User Story: As a platform engineer, I want processes to shut down gracefully when Kubernetes terminates them, so that in-flight work is not lost.
Acceptance Criteria


WHEN the Rails_Server receives SIGTERM, THE Rails_Server SHALL stop accepting new connections and complete in-flight requests before exiting
WHEN the Delayed_Jobs_Worker receives SIGTERM, THE Delayed_Jobs_Worker SHALL complete the current job if it finishes within terminationGracePeriodSeconds, otherwise the job SHALL be left in the queue for retry
WHEN the Kafka_Consumer receives SIGTERM, THE Kafka_Consumer SHALL commit offsets and disconnect cleanly before exiting
EACH Docker build target SHALL trap SIGTERM and handle graceful shutdown
THE Kubernetes_Deployment SHALL configure terminationGracePeriodSeconds of 30 for Rails_Server
THE Kubernetes_Deployment SHALL configure terminationGracePeriodSeconds of 300 for Delayed_Jobs_Worker
THE Kubernetes_Deployment SHALL configure terminationGracePeriodSeconds of 60 for each Kafka_Consumer

Requirement 6: Database Migrations

User Story: As a DevOps engineer, I want database migrations to run safely during deployments, so that schema changes don't cause downtime or data corruption.
Acceptance Criteria


THE deployment pipeline SHALL run rails db:migrate as a GitHub Actions step BEFORE applying Kubernetes deployment manifests
IF the migration step fails, THE deployment pipeline SHALL abort and NOT apply new Kubernetes manifests
THE migration step SHALL use the rails Docker build target image

Requirement 7: Secrets Management

User Story: As a security engineer, I want all sensitive configuration stored in Kubernetes Secrets, so that credentials are not exposed in manifests or logs.
Acceptance Criteria


THE Kubernetes_Deployment SHALL reference database credentials from a Secret named storecove-app-db-credentials
THE Kubernetes_Deployment SHALL reference Kafka credentials from a Secret named storecove-app-kafka-credentials
THE Kubernetes_Deployment SHALL reference Logz.io token from a Secret named storecove-app-logzio
THE Kubernetes_Deployment SHALL reference Rails master key from a Secret named storecove-app-master-key
NO Docker build target SHALL log any environment variables containing PASSWORD, SECRET, KEY, or TOKEN
THE Kubernetes_Deployment SHALL reference AWS credentials from a Secret named storecove-app-aws-credentials
THE Kubernetes_Deployment SHALL reference Valkey credentials from a Secret named storecove-app-valkey-credentials
THE Kubernetes_Deployment SHALL reference SQS/queue credentials from a Secret named storecove-app-queue-credentials
THE Kubernetes_Deployment SHALL reference email provider credentials from a Secret named storecove-app-email-credentials
THE Kubernetes_Deployment SHALL reference billing credentials (Chargebee, Stripe) from a Secret named storecove-app-billing-credentials
THE Kubernetes_Deployment SHALL reference Peppol/access point configuration from a Secret named storecove-app-peppol-credentials
THE Kubernetes_Deployment SHALL reference webhook encryption keys from a Secret named storecove-app-webhooks-credentials
THE Kubernetes_Deployment SHALL reference Rollbar API key from a Secret named storecove-app-rollbar-credentials
THE Kubernetes_Deployment SHALL reference Intercom credentials from a Secret named storecove-app-intercom-credentials

Requirement 8: Continuous Deployment on Master Merge

User Story: As a developer, I want the application to automatically deploy to OVH Kubernetes when changes are merged to master, so that new features reach production without manual build server intervention.
Acceptance Criteria


THE GitHub Actions workflow SHALL trigger automatically on every push to the master branch
THE GitHub Actions workflow SHALL build all Docker targets (rails, worker, kafka-sending-status, kafka-new-document, kafka-received-status) from the Dockerfile in the datajust repository
EACH Docker target SHALL be tagged with both {target}-{git-sha} and {target}-latest tags
THE GitHub Actions workflow SHALL push all built images to the OVH Container Registry
THE images pushed to OVH Container Registry SHALL be the SAME images deployed to Kubernetes (no rebuilding in production)
THE GitHub Actions workflow SHALL run database migrations using the rails target image BEFORE deploying new pods
THE GitHub Actions workflow SHALL apply Kubernetes manifests for all components
IF any build step fails, THE workflow SHALL abort and NOT deploy
IF the migration step fails, THE workflow SHALL abort and NOT apply new manifests
THE workflow SHALL use Docker BuildKit cache to optimize build times
THE workflow SHALL NOT use the build process from storecove-app-docker repository (deprecated)
THE GitHub Actions workflow SHALL notify Rollbar of successful deployments with git SHA, environment, and deployer information

Requirement 9: Repository and Build Process Deprecation

User Story: As a team member, I want clarity on which repositories and build processes are active vs. deprecated, so that I don't accidentally use outdated deployment methods.
Acceptance Criteria


THE deployment workflow SHALL build images from the datajust repository only
THE storecove-app-docker repository SHALL NOT be used for building production images
THE storecove-app-docker/production/build-deploy script SHALL NOT be used for deployments
THE .github/workflows/cd.yml workflow MAY continue to build images for CI/testing purposes, but these SHALL NOT be used for production deployments to OVH
ALL production deployments SHALL use images built by .github/workflows/deploy.yml

Requirement 10: Scheduled Tasks via Kubernetes CronJobs

User Story: As a platform engineer, I want scheduled tasks to run reliably via Kubernetes CronJobs, so that periodic maintenance and reporting jobs execute on time.
Acceptance Criteria


THE scheduled tasks SHALL be implemented using Kubernetes CronJob resources
EACH CronJob SHALL use the "rails" Docker build target as its container image
THE CronJob resources SHALL reference the same secrets as other deployments
THE CronJob resources SHALL define appropriate schedule expressions matching the current whenever configuration
THE CronJob resources SHALL set restartPolicy to "OnFailure"
THE CronJob resources SHALL set concurrencyPolicy to "Forbid" to prevent overlapping runs
THE CronJob container command SHALL execute rake tasks or rails runner commands as needed
THE deployment workflow SHALL apply CronJob manifests alongside Deployment manifests

Requirement 11: Ingress Configuration

User Story: As a platform engineer, I want proper ingress configuration, so that external traffic reaches the application with appropriate limits and routing.
Acceptance Criteria


THE Kubernetes Ingress SHALL route external traffic to the Rails_Server service
THE Kubernetes Ingress SHALL configure routes for app.fr.storecove.com (application host)
THE Kubernetes Ingress SHALL configure separate routes for api.fr.storecove.com (API host)
THE Kubernetes Ingress SHALL configure TLS certificates for both subdomains using cert-manager
THE Kubernetes Ingress for api.fr.storecove.com SHALL configure client-max-body-size of 100M
THE Kubernetes Ingress for app.fr.storecove.com SHALL configure client-max-body-size of 2M
THE Kubernetes Ingress SHALL configure appropriate proxy timeouts for long-running requests (300s)
THE Kubernetes Ingress SHALL be configured via annotations appropriate to the OVH ingress controller

Requirement 12: Static Asset Serving

User Story: As a platform engineer, I want static assets served directly from the Rails container.
Acceptance Criteria


THE Rails_Server container SHALL serve static assets directly via Puma (RAILS_SERVE_STATIC_FILES=true)
THE assets SHALL be precompiled during the Docker image build
THE Rails configuration SHALL NOT configure an external asset_host CDN
THE Kubernetes Ingress MAY configure caching headers for /assets paths

Requirement 13: Application Environment Configuration

User Story: As a DevOps engineer, I want all required environment variables configured, so that the application functions correctly.
Acceptance Criteria


THE Kubernetes_Deployment SHALL set RAILS_ENV to "production" (or appropriate environment)
THE Kubernetes_Deployment SHALL set RAILS_LOG_TO_STDOUT to "true"
THE Kubernetes_Deployment SHALL set RAILS_SERVE_STATIC_FILES to "true" for Rails_Server
THE Kubernetes_Deployment SHALL set PROCESS_TARGET to identify each component type (server, worker-primary, worker-secondary, kafka-sending-status, kafka-new-document, kafka-received-status)
THE Kubernetes_Deployment SHALL use Kubernetes Downward API to inject POD_NAME and POD_NAMESPACE environment variables
THE Kubernetes_Deployment for worker targets SHALL set DELAYED_JOB_POOLS with appropriate pool configuration


## tasks.md

      
    Raw
  

              tasks.md
            
          
    Implementation Plan: Kubernetes Rails Deployment

Overview

This plan implements the migration from ECS with daemonized background processes to Kubernetes-native architecture using Docker multi-stage builds. Each component (web server, delayed job workers, Kafka consumers) has its own Docker build target that produces a separate container image. Each image runs its process in the foreground with proper health checks, signal handling, and JSON logging.
Tasks


 1. Create Docker multi-stage build targets

 1.1 Create base and app-base stages in Dockerfile

Define base stage with all shared dependencies (Ruby, Node.js, system packages)
Create ruby-deps and node-deps stages for dependency caching
Create app-base stage with application code and precompiled assets
Requirements: 1.1


 1.2 Create rails build target

Expose port 3000
Set PROCESS_TARGET=server environment variable
Set RAILS_SERVE_STATIC_FILES=true environment variable
Set RAILS_LOG_TO_STDOUT=true environment variable
CMD to start Puma in foreground: bundle exec rails server -b 0.0.0.0 -p 3000
Requirements: 1.2, 1.7, 1.11, 12.1, 12.2, 13.2


 1.3 Create worker build target

Expose port 3001
Set PROCESS_TARGET=worker environment variable
Set DELAYED_JOB_POOLS="" environment variable (will be overridden by K8s deployment)
Set DELAYED_JOB_TIMEOUT=280 environment variable
Copy health_server.rb to /scripts/
CMD to start health server then delayed_job with --timeout and $DELAYED_JOB_POOLS variable expansion
Requirements: 1.3, 1.4, 1.7, 1.9, 1.12


 1.4 Create Kafka consumer build targets

Create kafka-sending-status target (port 3002)
Create kafka-new-document target (port 3003)
Create kafka-received-status target (port 3004)
Each starts health server then racecar consumer in foreground
Verify Kafka broker URLs are configured via environment variables or Racecar config file
Requirements: 1.5, 1.6, 1.7, 1.8, 1.9, 1.12, 1.13, 1.14


 1.5 Remove or simplify entrypoint.sh

The entrypoint.sh is no longer needed for process selection (targets have their own CMD)
Either remove entirely or simplify to just RVM initialization if needed
Requirements: 1.7


[ ]* 1.6 Write property test for build target produces correct process

Property 1: Build Target Produces Correct Process
Validates: Requirements 1.2, 1.3, 1.5, 1.6, 1.7


[ ]* 1.7 Write property test for foreground process execution

Property 2: Foreground Process Execution
Validates: Requirements 1.7
Note: Puma, delayed_job, and racecar handle SIGTERM gracefully by default


 2. Implement health check infrastructure

 2.1 Create WEBrick health check server for workers and Kafka consumers

Create scripts/health_server.rb with /health and /ready endpoints
Workers: load Rails environment and check database connectivity
Kafka consumers: simple process-alive check (no DB)
Return JSON responses with status, process_target, pod_name, namespace, and timestamp
Configure port via HEALTH_PORT environment variable
Requirements: 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11


 2.2 Create Rails health controller for web server

Add HealthController with liveness and readiness actions
Liveness: check process is alive (no DB check)
Readiness: check DB connectivity
Skip authentication for health endpoints
Add routes for /health/liveness and /health/readiness to config/routes.rb
Include pod_name and namespace in responses
Requirements: 3.1, 3.2, 3.3, 3.11


[ ]* 2.3 Write unit tests for health check endpoints

Test liveness returns 200 when process running
Test readiness returns 200 when DB connected
Test readiness returns 503 when DB disconnected
Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7


[ ]* 2.4 Write property test for health server starts before main process

Property 3: Health Server Starts Before Main Process
Validates: Requirements 1.9


 3. Configure JSON logging for all process types

 3.1 Configure Rails logger for JSON output

Add lograge gem to Gemfile
Configure lograge in config/environments/production.rb
Include timestamp, level, process_target, pod_name, namespace, and message fields
Configure for stdout output
Ensure sensitive data is not logged
Requirements: 2.1, 2.4, 2.5, 2.8


 3.2 Configure delayed_job for JSON logging

Set up JSON formatter for delayed_job output
Ensure logs go to stdout
Requirements: 2.2, 2.4


 3.3 Configure Racecar/Kafka consumers for JSON logging

Update config/initializers/racecar.rb to set config.logfile = STDOUT
Configure Racecar to use Rails.logger for consistent JSON formatting
Verify offset_commit_interval is set appropriately (default: 10 seconds)
Include required fields in log entries
Requirements: 2.3, 2.4


[ ]* 3.4 Write property test for JSON log format validity

Property 4: JSON Log Format Validity
Validates: Requirements 2.1, 2.2, 2.3


[ ]* 3.5 Write property test for required log fields presence

Property 5: Required Log Fields Presence
Validates: Requirements 2.4, 2.5


[ ]* 3.6 Write property test for sensitive data exclusion

Property 6: Sensitive Data Exclusion from Logs
Validates: Requirements 2.8, 7.5


 4. Checkpoint - Verify Dockerfile and health checks

Ensure all Docker targets build successfully
Ensure health endpoints respond correctly
Ask the user if questions arise.


 5. Create Kubernetes deployment manifests

 5.1 Create Rails server deployment manifest

Use image storecove-app:rails-latest
Configure liveness probe on /health/liveness (no DB check) with initialDelaySeconds=30, periodSeconds=10, failureThreshold=3
Configure readiness probe on /health/readiness (with DB check) with initialDelaySeconds=30, periodSeconds=10, failureThreshold=3
Set resource requests/limits (500m-2000m CPU, 1Gi-4Gi memory)
Set terminationGracePeriodSeconds to 30
Configure rollingUpdate with maxUnavailable=0, maxSurge=1
Add POD_NAME and POD_NAMESPACE env vars from downward API
Set RAILS_ENV, RAILS_LOG_TO_STDOUT, RAILS_SERVE_STATIC_FILES, PROCESS_TARGET
Reference secrets:

storecove-app-db-credentials
storecove-app-master-key
storecove-app-aws-credentials
storecove-app-valkey-credentials
storecove-app-email-credentials
storecove-app-billing-credentials
storecove-app-peppol-credentials
storecove-app-webhooks-credentials
storecove-app-intercom-credentials
storecove-app-rollbar-credentials


Requirements: 4.1, 4.2, 4.6, 4.7, 4.8, 4.12, 5.5, 7.1, 7.4, 7.6, 7.7, 7.9, 7.10, 7.11, 7.12, 7.13, 7.14, 13.1, 13.2, 13.3, 13.4, 13.5


 5.2 Create Kubernetes Service for Rails server

Create Service targeting port 3000
No Service for workers or Kafka consumers
Requirements: 4.11


 5.3 Create delayed job worker deployment manifests

Create worker-primary deployment with DELAYED_JOB_POOLS for: mail, inbound processing, SES/email, vatcalc, analyze/invoice/slack/apply, document_submissions
Create worker-secondary deployment with DELAYED_JOB_POOLS for: smp_phoss, aruba, webhooks, integrations, received_documents, storecove_api_self, active_storage, kafka, meta_events, customer_reporting, my_lhdnm_poller
Both use image storecove-app:worker-latest
Configure liveness probe on port 3001 with initialDelaySeconds=30, periodSeconds=30, failureThreshold=3
Set resource requests/limits: worker-primary (250m-1000m CPU, 512Mi-2Gi memory), worker-secondary (250m-2000m CPU, 512Mi-4Gi memory)
Set terminationGracePeriodSeconds to 300
Set PROCESS_TARGET to worker-primary or worker-secondary respectively
Reference secrets:

storecove-app-db-credentials
storecove-app-master-key
storecove-app-aws-credentials
storecove-app-valkey-credentials
storecove-app-queue-credentials
storecove-app-email-credentials
storecove-app-billing-credentials
storecove-app-peppol-credentials
storecove-app-webhooks-credentials


Requirements: 4.3, 4.6, 4.7, 4.9, 4.11, 4.14, 4.15, 4.16, 4.17, 5.6, 7.1, 7.4, 7.6, 7.7, 7.8, 7.9, 7.10, 7.11, 7.12, 13.4, 13.6


 5.4 Create Kafka consumer deployment manifests

Create deployment for kafka-sending-status consumer (image: kafka-sending-status-latest, port 3002)
Create deployment for kafka-new-document consumer (image: kafka-new-document-latest, port 3003)
Create deployment for kafka-received-status consumer (image: kafka-received-status-latest, port 3004)
Configure liveness probes with initialDelaySeconds=30, periodSeconds=30, failureThreshold=3
Set resource requests/limits (100m-500m CPU, 256Mi-1Gi memory)
Set terminationGracePeriodSeconds to 60
Reference secrets: storecove-app-db-credentials, storecove-app-master-key, storecove-app-kafka-credentials
Requirements: 4.4, 4.5, 4.6, 4.7, 4.10, 4.11, 5.7, 7.1, 7.2, 7.4


 5.5 Create Ingress manifests

Create main app Ingress for app.fr.storecove.com with 2M body size limit
Create API Ingress for api.fr.storecove.com with 100M body size limit
Configure TLS with cert-manager for both subdomains (2 certificates: storecove-app-tls, storecove-api-tls)
Configure proxy timeouts for long-running requests (300s)
Configure security headers (strip X-Powered-By, Server)
Add note about nginx-ingress retirement (March 2026) and Gateway API migration path
Verify ingress class matches OVH cluster configuration
Requirements: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8


 5.6 Create Kubernetes CronJob manifests

Create 16 CronJob manifests from config/schedule.rb (see design doc for complete mapping)
Each uses rails-latest image with imagePullPolicy: Always
Set concurrencyPolicy to "Forbid"
Set restartPolicy to "OnFailure"
Set activeDeadlineSeconds to 3600 (1 hour timeout per job)
Add security contexts matching Rails server deployment
Reference same secrets as Rails server deployment
Add MySQL CA volume mount to each CronJob
Define schedule expressions matching whenever configuration
Set PROCESS_TARGET to cronjob-{task-name} for each
Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8


[ ]* 5.7 Validate Kubernetes manifests with kubectl dry-run

Run kubectl apply --dry-run=client on all manifests (deployments, services, ingress, cronjobs)
Verify all required fields present
Requirements: 4.1, 4.2, 4.3, 4.4, 10.1, 11.1


 6. Configure Fluent Bit for Logz.io integration

 6.1 Create Fluent Bit DaemonSet manifest

Define DaemonSet in logging namespace
Mount container logs from host
Configure Logz.io output with token from secret storecove-app-logzio
Requirements: 2.6, 2.7, 7.3


 6.2 Create Fluent Bit ConfigMap

Configure tail input for storecove container logs
Add Kubernetes filter for metadata enrichment
Configure HTTP output to Logz.io
Add JSON parser configuration
Requirements: 2.6


 7. Checkpoint - Verify Kubernetes manifests

Ensure all manifests are valid
Ask the user if questions arise.


 8. Update GitHub Actions workflow

 8.1 Update deploy.yml to build multiple Docker targets

Build and push rails target with tag rails-$SHA and rails-latest
Build and push worker target with tag worker-$SHA and worker-latest
Build and push kafka-sending-status target
Build and push kafka-new-document target
Build and push kafka-received-status target
Use Docker BuildKit cache for faster builds
Consider parallel builds using matrix strategy for CI speed
Trigger automatically on push to master branch
Requirements: 4.7, 8.1, 8.2, 8.3, 8.4, 8.10


 8.2 Update deploy.yml for multi-deployment strategy

Use existing pause/pin/unpause strategy for migrations (already implemented)
Run db:migrate using rails image BEFORE applying K8s manifests
Abort deployment if migration or build fails
Apply all deployment manifests:

rails-server
worker-primary
worker-secondary
kafka-sending-status
kafka-new-document
kafka-received-status
CronJobs (k8s/cronjobs/)
Ingress


Create/update all required secrets (14 total):

storecove-app-db-credentials
storecove-app-kafka-credentials
storecove-app-logzio
storecove-app-master-key
storecove-app-aws-credentials
storecove-app-valkey-credentials
storecove-app-queue-credentials
storecove-app-email-credentials
storecove-app-billing-credentials
storecove-app-peppol-credentials
storecove-app-webhooks-credentials
storecove-app-rollbar-credentials
storecove-app-intercom-credentials


Notify Rollbar of successful deployment with git SHA, environment, and deployer
Ensure same images are deployed (no rebuilding)
Requirements: 6.1, 6.2, 6.3, 7.1, 7.2, 7.3, 7.4, 7.6, 7.7, 7.8, 7.9, 7.10, 7.11, 7.12, 7.13, 7.14, 8.5, 8.6, 8.7, 8.8, 8.9, 8.12


 8.3 Document deprecation of old build process

Add comments noting storecove-app-docker is deprecated
Ensure workflow uses datajust/Dockerfile only
Requirements: 8.11, 9.1, 9.2, 9.3, 9.4, 9.5


 9. Final checkpoint - Integration verification

Ensure all Docker targets build successfully
Ensure all tests pass
Verify all components can be deployed independently
Confirm health checks respond correctly
Ask the user if questions arise.


 10. Documentation and cleanup

 10.1 Update deployment documentation

Add README noting storecove-app-docker is deprecated for production
Document new deployment workflow
Document CronJob migration from whenever gem
Requirements: 9.1, 9.2, 9.3


 10.2 Archive old build scripts

Mark storecove-app-docker/production/build-deploy as deprecated
Add deprecation notice to old Dockerfile
Requirements: 9.2, 9.3, 9.4, 9.5


Notes


Tasks marked with * are optional and can be skipped for faster MVP
Each task references specific requirements for traceability
Checkpoints ensure incremental validation
Property tests validate universal correctness properties
Docker multi-stage builds allow building separate images from one Dockerfile
Each target has its own CMD and EXPOSE, no entrypoint script needed
Health check ports: 3000 (server), 3001 (worker), 3002-3004 (Kafka consumers)
Liveness probes check process health only; readiness probes check DB connectivity (for Rails server)
PROCESS_TARGET is set as ENV in each Dockerfile target, overridden by K8s deployment for workers
DELAYED_JOB_POOLS is set empty in Dockerfile, configured per-deployment in K8s manifests
DELAYED_JOB_TIMEOUT set to 280 seconds (slightly less than terminationGracePeriodSeconds)
The storecove-app-docker repository is deprecated for production builds
All production images are built from datajust/Dockerfile via deploy.yml workflow
13 application secrets + 1 MySQL CA certificate required for full deployment
16 CronJobs replace the whenever gem for scheduled tasks
Two separate Ingress resources for OVH production subdomains: app.fr.storecove.com (2M), api.fr.storecove.com (100M)
Puma runs in single-process mode (workers disabled) - scaling via K8s replicas
Racecar must log to STDOUT (update config/initializers/racecar.rb)
Component	Status	Replacement
`storecove-app-docker/production/Dockerfile`	Deprecated	`datajust/Dockerfile` with multiple targets
`storecove-app-docker/production/build-deploy`	Deprecated	`.github/workflows/deploy.yml`
`.github/workflows/cd.yml` GHCR images	Deprecated for prod	`.github/workflows/deploy.yml` OVH images
Manual ECS service restart	Deprecated	Automatic Kubernetes rolling update
S3 CDN for assets	Deprecated	Assets served from container
Container-level cron (whenever gem)	Deprecated	Kubernetes CronJobs
Variable	Required	Default	Description
`PROCESS_TARGET`	No	Set by target	Process type identifier (server, worker-primary, worker-secondary, kafka-*)
`HEALTH_PORT`	No	3001-3004	Port for health check server (workers/kafka)
`DELAYED_JOB_POOLS`	Conditional	""	Pool arguments for delayed_job (required for worker target)
`DELAYED_JOB_TIMEOUT`	No	280	Seconds to wait for job completion on SIGTERM
`RAILS_ENV`	Yes	-	Rails environment
`RAILS_LOG_TO_STDOUT`	No	true	Enable logging to stdout
`RAILS_SERVE_STATIC_FILES`	No	true	Enable static file serving from Puma
`RAILS_MAX_THREADS`	No	5	Maximum Puma threads per pod
`RAILS_MIN_THREADS`	No	5	Minimum Puma threads per pod
`DB_POOL`	No	5	ActiveRecord connection pool size (should match RAILS_MAX_THREADS)
`DATABASE_URL`	Yes	-	Database connection string
`KAFKA_*`	Conditional	-	Kafka credentials (required for kafka-* targets)
`LOGZIO_TOKEN`	Yes	-	Logz.io shipping token (via Secret)
`POD_NAME`	No	unknown	Kubernetes pod name (from downward API)
`POD_NAMESPACE`	No	default	Kubernetes namespace (from downward API)
Secret Name	Keys	Used By
storecove-app-db-credentials	DATABASE_HOST, DATABASE_PORT, DATABASE_USERNAME, DATABASE_PASSWORD, DATABASE_NAME	All
storecove-app-master-key	RAILS_MASTER_KEY	All
storecove-app-aws-credentials	AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, AWS_*_BUCKET	All
storecove-app-valkey-credentials	VALKEY_HOST, VALKEY_PORT, VALKEY_USERNAME, VALKEY_PASSWORD	Rails, Workers
storecove-app-queue-credentials	SQS_*_QUEUE URLs (bounces, complaints, deliveries, partner, peppol, receive, sftp)	Workers
storecove-app-kafka-credentials	KAFKA_CONSUMER, KAFKA_PRODUCER	Kafka consumers
storecove-app-logzio	LOGZIO_TOKEN	Fluent Bit
storecove-app-rollbar	ROLLBAR_ACCESS_TOKEN	GitHub Actions
storecove-app-email-credentials	EMAIL_PROVIDER_USERNAME, EMAIL_PROVIDER_PASSWORD	Rails, Workers
storecove-app-billing-credentials	CHARGEBEE_API_KEY, CHARGEBEE_SITE, STRIPE_SECRET_KEY	Rails, Workers
storecove-app-peppol-credentials	PEPPOL_SHOP_ID, DEFAULT_ACCESSPOINT_*	Rails, Workers
storecove-app-webhooks-credentials	WEBHOOKS_ENCRYPT_KEY, WEBHOOKS_ENCRYPT_IV	Rails, Workers
storecove-app-intercom-credentials	INTERCOM_APP_ID, INTERCOM_API_SECRET, INTERCOM_API_ACCESS_TOKEN	Rails
mysql-ca-cert	ca-cert.pem	All (mounted as volume)
Component	Health Port	Endpoint	Notes
Rails Server	3000	/health/liveness, /health/readiness	Via Rails controller
Worker Primary	3001	/health	Via WEBrick health_server.rb
Worker Secondary	3001	/health	Via WEBrick health_server.rb
Kafka Sending Status	3002	/health	Via WEBrick health_server.rb
Kafka New Document	3003	/health	Via WEBrick health_server.rb
Kafka Received Status	3004	/health	Via WEBrick health_server.rb
Component	CPU Request	CPU Limit	Memory Request	Memory Limit	Replicas	terminationGracePeriodSeconds
Rails Server	500m	2000m	1Gi	4Gi	2-10 (HPA)	30
Worker Primary	250m	1000m	512Mi	2Gi	2-5	300
Worker Secondary	250m	2000m	512Mi	4Gi	2-5	300
Kafka Consumer (each)	100m	500m	256Mi	1Gi	1-3	60
CronJob (each)	100m	500m	256Mi	1Gi	N/A	N/A
Task Description	CronJob Name	Schedule	Command
Customer reports	customer-reports	`0 6 * * *`	`rake customer_reporting:schedule_reports`
SaaS org reporting (monthly)	saas-organizations	`30 8 1 * *`	`rake saas:organizations_global && rake saas:organizations_asia && rake saas:organizations_pacific`
Peppol end users reporting	peppol-end-users	`0 23 2 * *`	`rake peppol_reporting:peppol_reporting_end_users`
Peppol transactions reporting	peppol-transactions	`0 1 3 * *`	`rake peppol_reporting:peppol_reporting_transactions`
Peppol SG/IRAS reporting	peppol-sg-monthly	`30 5 1 * *`	`rake peppol_reporting:identifiers_in_out_sg && rake peppol_reporting:reporting_sg_iras_sla_sandbox && rake peppol_reporting:reporting_sg_iras_sla_live`
AWS SES bounce rates	aws-ses-bounce-rates	`30 4 * * 1`	`rake aws_ses_reporting:bounce_rates_sending && rake aws_ses_reporting:bounce_rates_administrations`
Kafka sending/clearing updates	kafka-sending-clearing	`/10 * * *`	`rake kafka:produce_invoice_submission_action_update_requests_sending && rake kafka:produce_invoice_submission_action_update_requests_clearing`
Kafka new docs hourly	kafka-new-docs-hourly	`0 * * * *`	`rake kafka:produce_new_documents_request_hourly`
Kafka new docs daily	kafka-new-docs-daily	`0 0 * * *`	`rake kafka:produce_new_documents_request_daily`
Clean delayed jobs queue	clean-delayed-jobs	`/5 * * *`	`rake railsdb:clean_delayed_jobs_inboundpeppol`
CorpPass/MyKYC detection	corppass-mykyc-detect	`/5 * * *`	`rake corppass:detect[sandbox] && rake corppass:detect[live] && rake mykyc:detect[sandbox] && rake mykyc:detect[live]`
Reconcile Chargebee	reconcile-chargebee	`15 7 * * 6`	`rake saas:reconcile_chargebee`
Check invalid identifiers	identifiers-invalid	`30 7 * * 6`	`rake identifiers:invalid`
SMP reconciliation	smp-reconcile	`0 8 * * 6`	`rake smp:reconcile && rake smp:reconcile_sg`
Email worker	email-worker	`0 * * * *`	`rails runner "C5::EmailWorker.new.perform"`
Invoice analyzer	invoice-analyzer	`0 * * * *`	`rails runner "InvoiceAnalyzerJob.perform_later"`
Error Condition	Behavior	Exit Code
Missing required env var	Log error with variable name, exit	1
Process fails to start	Log error with details, exit	1
Health server fails to bind	Log error, continue (main process may still work)	-
Component	Error Condition	HTTP Status	Response Body
Rails Server (readiness)	DB connection failed	503	`{"status":"not_ready","error":"..."}`
Worker	DB connection failed	503	`{"status":"unhealthy","error":"..."}`
Worker	Process not running	503	`{"status":"unhealthy","error":"process not found"}`
Kafka Consumer	Process crashed	503	`{"status":"unhealthy","error":"..."}`
Component	terminationGracePeriodSeconds	Rationale
Rails Server	30	Typical HTTP request timeout
Delayed Job Worker	300	Jobs may take several minutes
Kafka Consumer	60	Offset commit and disconnect