Skip to content

Instantly share code, notes, and snippets.

@ksuderman
Created January 15, 2026 02:14
Show Gist options
  • Select an option

  • Save ksuderman/22b2bea4c88d503696e6ded2cd2a5b5c to your computer and use it in GitHub Desktop.

Select an option

Save ksuderman/22b2bea4c88d503696e6ded2cd2a5b5c to your computer and use it in GitHub Desktop.
Compare and contrast the two options available to dispatch Galaxy jobs to Google Batch.

GCP Batch Job Runners Comparison

Galaxy supports two approaches for dispatching jobs to Google Cloud Batch: the Direct GCP Batch Runner and the Pulsar GCP Batch Runner. Each has distinct architectures and trade-offs.

Architecture Overview

Direct GCP Batch Runner (gcp_batch)

  • Runner class: galaxy.jobs.runners.gcp_batch:GoogleCloudBatchJobRunner
  • File access: NFS mount from Kubernetes cluster
  • Communication: Direct polling of GCP Batch API
  • Container model: Single container running the tool

Pulsar GCP Batch Runner (pulsar_gcp)

  • Runner class: galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner
  • File access: HTTP transfer to local SSD via Pulsar sidecar
  • Communication: RabbitMQ message queue + Galaxy API
  • Container model: Two containers (Pulsar sidecar + tool container)

Comparison

Aspect Direct GCP Batch Pulsar GCP Batch
Startup overhead Lower (just mount NFS) Higher (file staging required)
I/O performance Network-bound (NFS) Local SSD (375GB+)
Large input files Better (no transfer) Slower (must download)
I/O-intensive tools Slower (network latency) Faster (local disk)
Network configuration Supported (network/subnet params) Not yet supported
Galaxy accessibility Internal IP (same VPC) Requires public IP or VPC peering
Complexity Simpler More complex
Firewall requirements NFS ports (2049, 111) RabbitMQ (5672) + HTTP (80/443)

When to Use Each Approach

Use Direct GCP Batch (gcp_batch) when:

  • Input files are large (reduces transfer time)
  • Tools have moderate I/O requirements
  • You want simpler infrastructure
  • Galaxy and Batch VMs are in the same VPC
  • You need fine-grained network control

Use Pulsar GCP Batch (pulsar_gcp) when:

  • Tools are I/O-intensive (benefit from local SSD)
  • Input files are small to medium sized
  • You need to run jobs in a different network/project
  • Tool execution time dominates over file transfer time

Current Limitations

Direct GCP Batch

  • Requires NFS server accessible from Batch VMs
  • VM must be in same VPC as Galaxy cluster (if not, requires VPC peering or Cloud Filestore for NFS access)

Pulsar GCP Batch

  • CRITICAL: Runnable deadlock - the Pulsar sidecar container is missing background: true flag, causing the tool container to never start (requires fix in pulsar-galaxy-lib gcp_job_template())
  • Missing network and subnet parameters (requires code fix in pulsar-galaxy-lib)
  • Missing automatic machine_type computation from cores/mem (uses hardcoded default)
  • Requires Galaxy to be accessible via public IP (until network params are added)
  • kill() method not implemented (jobs cannot be cancelled cleanly)

Configuration Examples

Direct GCP Batch

runners:
  gcp_batch:
    load: galaxy.jobs.runners.gcp_batch:GoogleCloudBatchJobRunner
    project_id: my-project
    region: us-east4
    network: default
    subnet: default
    nfs_server: 10.0.0.5
    nfs_path: /export/galaxy

Pulsar GCP Batch

runners:
  pulsar_gcp:
    load: galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner
    amqp_url: pyamqp://user:pass@rabbitmq-ip:5672//
    galaxy_url: http://galaxy-public-ip

execution:
  environments:
    pulsar_gcp:
      runner: pulsar_gcp
      project_id: my-project
      region: us-east4
      machine_type: n2-standard-8
      disk_size: 375
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment