joshgord/dataloader-chaining.md

## dataloader-chaining.md

      
    Raw
  

              dataloader-chaining.md
            
          
    DataLoader Chaining: The loadAndDispatch Bottleneck

Problem Summary

Gusto's GraphQL resolvers extensively use a pattern called loadAndDispatch() that defeats DataLoader batching, causing every chained DataLoader call to dispatch with a batch size of 1. This was identified as the primary driver of DataLoaderHelper being the #1 allocation source in production (48,910 samples / 11.6% of all allocations).

How the Bottleneck Was Identified

Allocation Profiling (2026-02-13)

An async-profiler allocation capture (-e alloc, 30 seconds, 463K samples) on a production gusto instance revealed that DataLoaderHelper dominated all allocation sources:


Source
Samples
% Total


DataLoaderHelper
48,910
11.6%


PacsAuthzInstrumentation
36,108
8.6%


ContextDataFetcherDecorator
28,550
6.8%


HashMap.resize
25,021
5.4%


Root Cause Analysis

A codebase-wide search found 119 call sites across 42 files using the loadAndDispatch pattern:
// From ImageDataLoader.java — the pattern used everywhere
public static CompletableFuture<List<ImageResult>> loadAndDispatch(
    DataFetchingEnvironment env, ImageDataLoaderKey key) {
  var loader = get(env);
  var future = loader.load(key);
  loader.dispatch();  // <-- immediately dispatches with batch size of 1
  return future;
}
Every DataLoader that supports batching (VideoCore, Images, ECL, Collections, Persons, etc.) is being called with batch size 1 when invoked through this pattern — meaning the downstream gRPC services receive N individual requests instead of 1 batched request with N keys.

Why loadAndDispatch Exists

In graphql-java (pre-v25), DataLoader dispatch happens at the end of each field resolution level. When a DataFetcher chains into another DataLoader via .thenCompose(), the second load() call happens after the framework has already dispatched that level. Without manual dispatch, the request hangs forever:
// ShowDataFetcher.currentEpisode — real example from gusto
return EvidenceLoader.loadCurrentEpisode(env, unifiedEntityId)
    // This runs AFTER framework dispatch — framework won't dispatch again
    .thenCompose(episodeId -> VideoCoreDataLoader.loadAndDispatch(env, episodeId));
    //                                           ^^^^^^^^^^^^^^^^
    //                     manual dispatch forces batch-of-1 to avoid hanging
The comment in the codebase explains this directly:

"Unfortunately, we currently must manually dispatch a dataloader when it is composed after another Future. Otherwise, the thread will hang indefinitely waiting for something to manually dispatch it."


Performance Overhead

Allocation Cost

Each loadAndDispatch call triggers a full DataLoaderHelper.dispatch() cycle including:

Creating new CompletableFuture chains per dispatch
Invoking the MappedBatchLoader.load() with a single key
gRPC call setup, serialization, and response handling per individual request

With 119 call sites and many executing multiple times per GraphQL request (e.g., EvidenceLoader has 26 references), this multiplies to dozens of dispatch cycles per request, most with batch size 1.
Downstream RPS Amplification

DataLoaders that support batching (e.g., ECLFetchEvidenceDataLoader, VideoCoreDataLoader, ImageDataLoader) are designed to batch N keys into a single gRPC call. The loadAndDispatch pattern defeats this — a page with 40 videos generates 40 individual gRPC calls instead of 1 batched call.
Canary Validation (PR #4023)

A dev canary using graphql-java 25 beta with chained DataLoaders (applied only to live event prefetching) showed:

Statistically significant p50 latency improvement — even with chaining applied to just one small pocket of the app
Clear reduction in downstream gRPC RPS to EvidenceControlLayerService due to better batching
Flat or slightly improved CPU — fewer, larger batch calls are more efficient than many individual calls


Proposed Solutions

Option A: graphql-java 25 Chained DataLoaders (Recommended)

graphql-java 25 introduces native support for chained DataLoaders that eliminates the hanging problem entirely.
How it works: The engine automatically detects when a DataLoader load() is called inside a .thenCompose() chain and schedules dispatch appropriately, allowing multiple chained loads to batch together.
// With graphql-java 25 — no manual dispatch needed, batching works
return EvidenceLoader.loadCurrentEpisode(env, unifiedEntityId)
    .thenCompose(episodeId -> VideoCoreDataLoader.load(env, episodeId));
    //                                           ^^^^
    //                     plain load() — framework handles dispatch + batching
Enabling:
GraphQL graphQL = GraphQL.unusualConfiguration(graphqlContext)
    .dataloaderConfig()
    .enableDataLoaderChaining(true);
Status: PR #4023 validated this approach on a dev canary with positive results. Currently applied only to live event prefetching.
Reference: graphql-java Chained DataLoaders docs
Dependency: graphql-java 25 is obtained through DGS framework, which is tied to SBN. The official path is SBN4 (targeting early 2026). Forcing the dependency independently via resolutionStrategy.force is possible but unsupported by the platform team until SBN4 GA.
Slack thread: #dna-gusto-dev discussion
Option B: DGS Ticker Mode

DGS provides a ScheduledDataLoaderRegistry that polls on a configurable interval and dispatches any DataLoaders with pending keys.
dgs:
  graphql:
    dataloader:
      ticker-mode-enabled: true
      schedule-duration: 10ms  # default
Pros:

Simple 1-line config change
Works with current graphql-java version (no dependency upgrade)
Chained loads get auto-dispatched within ~10ms
Multiple loads queued around the same time batch together

Cons:

Adds up to 10ms latency per chained load (the polling interval)
Less optimal batching than graphql-java 25's native chaining (timer-based vs graph-aware)
Existing loadAndDispatch calls still fire manual dispatches before the ticker

Option C: Selective Restructuring (Incremental)

Restructure individual hot-path resolvers to avoid chaining where possible — e.g., pre-fetch all needed data at the same field level so loads batch naturally.
Pros: No dependency changes, targeted fixes
Cons: Labor-intensive, doesn't fix the systemic problem, may not be possible for all call patterns

Recommended Plan


Phase
Action
Dependency
Risk


1
Sign up as early tester for SBN4
Platform team
None


2
Expand PR #4023 to chain DataLoaders across the entire app (not just live prefetching)
graphql-java 25 beta
Medium — unsupported until SBN4 GA


3
Dev canary the full change; validate batching improvements via Atlas gRPC metrics
Canary infrastructure
Low


4
Remove loadAndDispatch methods from all DataLoader classes
Phase 3 validation
Low — mechanical refactor


5
Adopt graphql-java 25 GA when SBN4 ships
SBN4 GA
None


Key Call Sites to Migrate (highest impact)


DataLoader
loadAndDispatch References
Downstream Service


EvidenceLoader
26
ECL (evidence control layer)


VideoCoreDataLoader
10+
oasis VideoCore gRPC


ImageDataLoader
4
Images gRPC


ECLHydratePageDataLoader
3
ECL hydration gRPC


GameDataLoader
5
Games gRPC


CollectionDataLoader
4
Collections gRPC


LiveEventsDataLoader
3
Live events gRPC


PersonDataLoader
2
Person gRPC


All others
~60 combined
Various
Source	Samples	% Total
DataLoaderHelper	48,910	11.6%
PacsAuthzInstrumentation	36,108	8.6%
ContextDataFetcherDecorator	28,550	6.8%
HashMap.resize	25,021	5.4%
Phase	Action	Dependency	Risk
1	Sign up as early tester for SBN4	Platform team	None
2	Expand PR #4023 to chain DataLoaders across the entire app (not just live prefetching)	graphql-java 25 beta	Medium — unsupported until SBN4 GA
3	Dev canary the full change; validate batching improvements via Atlas gRPC metrics	Canary infrastructure	Low
4	Remove `loadAndDispatch` methods from all DataLoader classes	Phase 3 validation	Low — mechanical refactor
5	Adopt graphql-java 25 GA when SBN4 ships	SBN4 GA	None
DataLoader	`loadAndDispatch` References	Downstream Service
`EvidenceLoader`	26	ECL (evidence control layer)
`VideoCoreDataLoader`	10+	oasis VideoCore gRPC
`ImageDataLoader`	4	Images gRPC
`ECLHydratePageDataLoader`	3	ECL hydration gRPC
`GameDataLoader`	5	Games gRPC
`CollectionDataLoader`	4	Collections gRPC
`LiveEventsDataLoader`	3	Live events gRPC
`PersonDataLoader`	2	Person gRPC
All others	~60 combined	Various