ryan-williams/electrai#64.md

## electrai#64.md

      
    Raw
  

              electrai#64.md
            
          
    Quantum-Accelerators/electrai#64 Add GPU CI: e2e tests and benchmarks

Summary


Add deterministic e2e training test (tests/e2e_train.py) with platform-specific expected values
Add GPU e2e workflow (gpu-e2e.yml) using ec2-gha on EC2 g6.xlarge (NVIDIA L4), runs on PRs targeting main
Add GPU benchmark workflow (gpu-benchmark.yml) with configurable model size, weekly schedule, and manual dispatch
Add WandB logging to benchmark: logs training metrics, model config, dataset version, and instance type
Add scripts/s3_sync.py: reusable S3 data sync with size filtering and deterministic dataset hashing
Add gen-expected.yml for regenerating expected values on GHA runners (macOS, Ubuntu)
Add e2e_training_demo.ipynb notebook with training visualization
Add .github/workflows/README.md documenting all workflows
Rename tests/electrai/ → tests/test_electrai/ to fix import shadowing
Add --gradient-checkpoint flag to e2e_train.py for large models on limited VRAM
Benchmark summary includes linked commit SHA for traceability

Passing Runs


GPU E2E Test #27 — GPU + CPU baseline, both pass (Feb 24)
GPU Benchmark #27 — 50 S3 samples, WandB logging (Feb 24)

Required Setup

Secrets:

GH_SA_TOKEN — GitHub PAT for runner registration
WANDB_API_KEY — WandB API key (optional, for benchmark logging; currently Ryan's personal key)

IAM/OIDC:

Trust policy configured in Open-Athena/ops for ec2-gha OIDC authentication

Screenshots


  EC2 instances
  

  EC2 tags
  

      GitHub runners
    
  
      WandB dashboard
    
  
Test plan


 GPU e2e test passes on EC2 g6.xlarge
 CPU e2e test passes on EC2
 GPU benchmark completes with production-size model (32ch/16 blocks, 128^3 grids)
 gen-expected.yml generates correct values on macOS and Ubuntu
 Expected values verified across all 3 platforms (darwin-arm64, linux, linux-gpu)
 WandB run logged with correct metadata (project, dataset version, instance type)
 scripts/s3_sync.py downloads correct samples and generates deterministic dataset hash
EC2 instances	EC2 tags
GitHub runners
WandB dashboard
No results found