Skip to content

Instantly share code, notes, and snippets.

@jackfrancis
Created March 4, 2026 19:33
Show Gist options
  • Select an option

  • Save jackfrancis/45f396e908a6d205352e71a1237b3856 to your computer and use it in GitHub Desktop.

Select an option

Save jackfrancis/45f396e908a6d205352e71a1237b3856 to your computer and use it in GitHub Desktop.
KubeRay Multicluster analysis

KubeRay Federation Proposal — Analysis & Comparative Landscape

Proposal Summary

Issue #4561 by @yuchen-ecnu proposes adding federation capability to KubeRay so that a single logical RayCluster can span multiple Kubernetes clusters. The core motivation:

  • Fragmented GPUs: Organizations procure GPUs across multiple cloud vendors/AZs. Today these are isolated into separate K8s clusters, preventing a unified Ray cluster.
  • Operational pain: Users must split datasets, deploy multiple small RayClusters, and manually manage them — causing long-tail performance issues and complexity.
  • Virtual Kubelet limitations: The common workaround (aggregating via Virtual Kubelet) creates control-plane scalability bottlenecks, especially at scale (e.g., 10K→400K+ cores in an hour).

Desired end state: Submit one RayJob to one federated RayCluster, and Ray Data/Serve automatically schedules tasks across workers in any AZ/cloud, with preemption resilience and cross-cluster load balancing.

Key Discussion Thread Insights

Participant Position
@andrewsykim (KubeRay maintainer) Notes MultiKueue as an alternative but acknowledges it works at CRD-level, not Ray task/actor-level. Sees Ray Data batch inference as the ideal use case since it's GPU-local and horizontally scalable.
@siyuanfoundation (contributor) Raises the cross-AZ communication overhead concern — Ray head must be topology-aware to avoid performance degradation. Suggests scoping to non-reshuffling Ray Data and Ray Serve only. Proposes an alternative: a Ray job delegator/proxy that dispatches jobs to separate RayClusters and aggregates status. Also notes MultiKueue supports pod-level scheduling, so worker pods could be delegated while head stays in one cluster.
@Future-Outlier (KubeRay member) Asks about SkyPilot overlap.
@yuchen-ecnu (author) Clarifies this is about federating resources into a single Ray cluster, not cross-K8s management (which SkyPilot does). Cites Tencent's prior art from Ray Forward 2025.

Comparative Analysis: Multi-Cluster AI/ML Solutions

Solution Abstraction Level Single Logical Cluster? Ray-Aware? Maturity Best For
KubeRay Federation (proposed) Ray task/actor ✅ Yes — unified RayCluster ✅ Native Proposal stage Ray Data batch inference, Ray Serve
NVIDIA Dynamo Inference framework ✅ Yes — disaggregated prefill/decode ❌ No (own framework) Production (2025+) LLM inference optimization
SkyPilot Cluster provisioning ❌ Separate clusters per cloud ⚠️ Partial (launches Ray clusters) Production Multi-cloud Ray cluster provisioning
MultiKueue (Kueue) K8s job/pod dispatch ❌ Dispatches whole jobs to clusters ❌ No Beta (v0.9+) Job-level multi-cluster GPU scheduling
Karmada K8s resource federation ❌ Propagates CRDs/workloads ❌ No CNCF Incubating General K8s multi-cluster federation
Volcano Global Batch scheduling ❌ Cross-cluster queue & dispatch ❌ No Early production Gang-scheduled training, batch AI
Admiralty Pod scheduling ❌ Proxy pods in target clusters ❌ No Niche/Stable Multi-cluster pod placement
Liqo Network/resource mesh ⚠️ Transparent pod offloading ❌ No CNCF Sandbox Hybrid/edge cloud bursting

Deep-Dive Comparison

1. NVIDIA Dynamo

Dynamo solves a related but different problem: optimizing multi-node LLM inference (disaggregated prefill/decode, KV cache routing, MoE rebalancing). It operates within a GPU cluster, not across K8s clusters. However, its Grove API for Kubernetes-native orchestration shows the industry trajectory toward framework-aware, topology-aware scheduling — which is exactly what KubeRay Federation would need to handle cross-AZ latency.

Key distinction: Dynamo is inference-engine-level optimization; KubeRay Federation is cluster-topology-level resource pooling. They're complementary, not competing.

2. SkyPilot

SkyPilot provisions and manages separate Ray clusters across clouds (Shopify's multi-cloud GPU fleet is a notable example). It doesn't create a single unified Ray cluster — each provisioned cluster is independent. The proposal explicitly distinguishes itself from SkyPilot: KubeRay Federation wants one RayCluster with workers distributed across K8s clusters, enabling intra-cluster load balancing via Ray's scheduler.

Key distinction: SkyPilot = multi-cloud cluster provisioning; KubeRay Federation = single-cluster resource unification.

3. MultiKueue

The most architecturally adjacent K8s-native solution. MultiKueue dispatches entire workloads to whichever worker cluster has capacity. As @siyuanfoundation noted, it could be adapted: deploy the RayCluster head in one cluster, and use MultiKueue to dispatch worker pods to remote clusters (since MultiKueue supports pod-level scheduling). This is a pragmatic middle-ground that avoids deep Ray-level changes.

Key distinction: MultiKueue dispatches at the K8s object boundary; KubeRay Federation wants Ray's scheduler to balance tasks across all workers regardless of their physical cluster.

4. Karmada + Volcano Global

Karmada federates K8s resources across clusters via PropagationPolicies. Volcano Global adds AI-specific scheduling (gang scheduling, queue fairness) atop Karmada. Together they could propagate RayCluster worker groups to different clusters. However, like MultiKueue, this operates at the K8s level — Ray wouldn't "know" about the topology, so cross-AZ data shuffling could silently degrade performance.

Key distinction: General-purpose K8s federation vs. Ray-topology-aware federation.

Critical Technical Challenges for the Proposal

  1. Cross-AZ network latency: As @siyuanfoundation flagged, Ray's head node must be topology-aware. Without this, object transfers and task scheduling will blindly route across AZs, potentially destroying performance for anything involving data movement. The proposal should explicitly scope to workloads with minimal cross-node communication (batch inference with local GPU execution, stateless serving).

  2. Ray GCS and head node as single point of failure: The head node lives in one cluster. Cross-cluster network partitions could orphan all remote workers simultaneously.

  3. Networking: Worker pods in remote clusters must reach the head node's GCS and Ray object store. This requires cross-cluster networking (service mesh, VPN, or public endpoints), which adds latency and security surface area.

  4. Autoscaler integration: KubeRay's autoscaler currently talks to a single K8s API server. Federation requires it to create/delete worker pods across multiple clusters, meaning multi-cluster API credentials and reconciliation.

Recommendations / Assessment

The proposal addresses a real and growing pain point — GPU resource fragmentation across clusters is one of the top operational challenges for large-scale AI teams. However:

  • Start narrow: The community feedback correctly suggests scoping to Ray Data (no shuffle) and Ray Serve first. These workloads are embarrassingly parallel / stateless and tolerate cross-AZ latency.
  • Consider the proxy/delegator alternative: @siyuanfoundation's suggestion of a job delegator that dispatches to separate RayClusters and aggregates results may deliver 80% of the value with 20% of the complexity — no Ray-level changes needed, just a KubeRay-level orchestration layer.
  • Leverage MultiKueue for worker pod placement: Rather than building full federation from scratch, using MultiKueue to place worker pods in remote clusters (while the head stays in one cluster) could be a pragmatic first step that's composable with the existing ecosystem.
  • NVIDIA Dynamo is not a competitor here but its topology-aware routing patterns (especially the Grove API) are worth studying as a design reference for how to make the Ray head aware of worker locality.

The proposal fills a unique gap — none of the existing solutions provide a single logical Ray cluster spanning K8s boundaries. Whether that's built as deep Ray+KubeRay integration or as a lighter-weight orchestration layer above multiple RayClusters is the key architectural decision the community needs to make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment