Skip to content

Instantly share code, notes, and snippets.

@ryan-williams
Last active February 25, 2026 13:02
Show Gist options
  • Select an option

  • Save ryan-williams/dbd94c3f5669f88f946c99e1d37818f6 to your computer and use it in GitHub Desktop.

Select an option

Save ryan-williams/dbd94c3f5669f88f946c99e1d37818f6 to your computer and use it in GitHub Desktop.

Quantum-Accelerators/electrai#64 Add GPU CI: e2e tests and benchmarks

Summary

  • Add deterministic e2e training test (tests/e2e_train.py) with platform-specific expected values
  • Add GPU e2e workflow (gpu-e2e.yml) using ec2-gha on EC2 g6.xlarge (NVIDIA L4), runs on PRs targeting main
  • Add GPU benchmark workflow (gpu-benchmark.yml) with configurable model size, weekly schedule, and manual dispatch
  • Add WandB logging to benchmark: logs training metrics, model config, dataset version, and instance type
  • Add scripts/s3_sync.py: reusable S3 data sync with size filtering and deterministic dataset hashing
  • Add gen-expected.yml for regenerating expected values on GHA runners (macOS, Ubuntu)
  • Add e2e_training_demo.ipynb notebook with training visualization
  • Add .github/workflows/README.md documenting all workflows
  • Rename tests/electrai/tests/test_electrai/ to fix import shadowing
  • Add --gradient-checkpoint flag to e2e_train.py for large models on limited VRAM
  • Benchmark summary includes linked commit SHA for traceability

Passing Runs

Required Setup

Secrets:

  • GH_SA_TOKEN — GitHub PAT for runner registration
  • WANDB_API_KEY — WandB API key (optional, for benchmark logging; currently Ryan's personal key)

IAM/OIDC:

  • Trust policy configured in Open-Athena/ops for ec2-gha OIDC authentication

Screenshots

EC2 instances
EC2 tags
GitHub runners
WandB dashboard

Test plan

  • GPU e2e test passes on EC2 g6.xlarge
  • CPU e2e test passes on EC2
  • GPU benchmark completes with production-size model (32ch/16 blocks, 128^3 grids)
  • gen-expected.yml generates correct values on macOS and Ubuntu
  • Expected values verified across all 3 platforms (darwin-arm64, linux, linux-gpu)
  • WandB run logged with correct metadata (project, dataset version, instance type)
  • scripts/s3_sync.py downloads correct samples and generates deterministic dataset hash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment