Galaxy supports two approaches for dispatching jobs to Google Cloud Batch: the Direct GCP Batch Runner and the Pulsar GCP Batch Runner. Each has distinct architectures and trade-offs.
- Runner class:
galaxy.jobs.runners.gcp_batch:GoogleCloudBatchJobRunner - File access: NFS mount from Kubernetes cluster
- Communication: Direct polling of GCP Batch API
- Container model: Single container running the tool
- Runner class:
galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner - File access: HTTP transfer to local SSD via Pulsar sidecar
- Communication: RabbitMQ message queue + Galaxy API
- Container model: Two containers (Pulsar sidecar + tool container)
| Aspect | Direct GCP Batch | Pulsar GCP Batch |
|---|---|---|
| Startup overhead | Lower (just mount NFS) | Higher (file staging required) |
| I/O performance | Network-bound (NFS) | Local SSD (375GB+) |
| Large input files | Better (no transfer) | Slower (must download) |
| I/O-intensive tools | Slower (network latency) | Faster (local disk) |
| Network configuration | Supported (network/subnet params) | Not yet supported |
| Galaxy accessibility | Internal IP (same VPC) | Requires public IP or VPC peering |
| Complexity | Simpler | More complex |
| Firewall requirements | NFS ports (2049, 111) | RabbitMQ (5672) + HTTP (80/443) |
- Input files are large (reduces transfer time)
- Tools have moderate I/O requirements
- You want simpler infrastructure
- Galaxy and Batch VMs are in the same VPC
- You need fine-grained network control
- Tools are I/O-intensive (benefit from local SSD)
- Input files are small to medium sized
- You need to run jobs in a different network/project
- Tool execution time dominates over file transfer time
- Requires NFS server accessible from Batch VMs
- VM must be in same VPC as Galaxy cluster (if not, requires VPC peering or Cloud Filestore for NFS access)
- CRITICAL: Runnable deadlock - the Pulsar sidecar container is missing
background: trueflag, causing the tool container to never start (requires fix in pulsar-galaxy-libgcp_job_template()) - Missing
networkandsubnetparameters (requires code fix in pulsar-galaxy-lib) - Missing automatic
machine_typecomputation fromcores/mem(uses hardcoded default) - Requires Galaxy to be accessible via public IP (until network params are added)
kill()method not implemented (jobs cannot be cancelled cleanly)
runners:
gcp_batch:
load: galaxy.jobs.runners.gcp_batch:GoogleCloudBatchJobRunner
project_id: my-project
region: us-east4
network: default
subnet: default
nfs_server: 10.0.0.5
nfs_path: /export/galaxyrunners:
pulsar_gcp:
load: galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner
amqp_url: pyamqp://user:pass@rabbitmq-ip:5672//
galaxy_url: http://galaxy-public-ip
execution:
environments:
pulsar_gcp:
runner: pulsar_gcp
project_id: my-project
region: us-east4
machine_type: n2-standard-8
disk_size: 375