Help me plan a new "collaborative parallel execution" feature in dagger. The general idea:
- Dagger functions can be annotated with a new directive @parallel (+parallel in Go etc) which indicates to the engine that this function can be executed in parallel
- The dimensions of paralellism are controlled by arguments. Any argument to a parallel function that is a list, is interpreted by the engine as a dimension of parallelism. The contract is that when a client invokes a paralle function, the engine may split that "frontend" invocation into N "backend" invocations, with each invocation receiving a slice of the frontend list argument. The function implementor is responsible for guarranteeing correct behavior of the function regardless of the splitting.
- Only return types that can be reliably merged in any order (I don't know the fancy terminology) is allowed. Example: Directory is allowed because you can always merge N directories together. Changeset is allowed. Service could conceivably be merged into a composite proxy service (let's worry about that later). etc.
Your plan:
- Research the dagger codebase so you understand what I'm talking about. Start with docs/current_docs. Then maybe skim core/ and engine/ (server-side) then cmd/dagger (client side)
- Look at toolchains/go . Understand that my primary test case is adding support for test filtering & splitting for go and other test runners.
Ask lots of questions. This is exploratory
[Explored codebase - docs, core/engine architecture, Go toolchain, cmd/dagger client]
Summary of Key Architecture:
-
Function Execution Flow:
- CLI parses arguments → builds GraphQL query → sends to engine
ModuleFunction.Call()incore/modfunc.gohandles execution- Arguments go through
ConvertToSDKInput→ SDK executes →ConvertFromSDKResult - List arguments are passed as-is - no splitting currently exists
-
Directives System:
- Go uses comment directives:
+optional,+default="value",+cache="session",+check - These are parsed during codegen and become GraphQL schema metadata
- Adding
+parallelwould follow this pattern
- Go uses comment directives:
-
Types with Merge Potential:
Directory- can be merged (overlay semantics)File- potentially mergeable depending on semanticsService- you mentioned proxy compositionContainer- unclear how to merge- Primitives - depends (strings could concat, numbers could sum?)
-
Go Toolchain Test Pattern:
Test(ctx, run, skip, failfast, parallel, timeout, count, pkgs []string)pkgsis already a list argument - perfect candidate for splitting- Internal parallelism exists via
util/parallelpackage
Clarifying Questions:
- Where does splitting happen?
- How are multiple list arguments handled?
- Who defines the merge operation?
- Error semantics?
- Caching implications?
- Idempotency contract?
- Changeset type?
- Practical test splitting scenario
-
Changeset type already exists. core/changeset.go. Just one example
-
For test splitting, the convention is for the function to return void, and we rely on errors to signal "check failed". So merging is a no-op. Let's discard the issue of merging altogether for now
-
Do you want to know how I plan to introduce a telemetry-driven loop, so that a scheduler can instrument test splitting based on actual test run times, similar to how buildkite and circleci implement test splitting, but more generalizable & programmable (and with better local-first support)? That is a major goal of this
Yes, please tell me about the telemetry-driven scheduling loop.
Dagger is fully instrumented by otel, out of the box. Each dagger function call emits a span. When a dagger function calls the dagger system api, those are functions too, and traced accordingly. Even better, when using the system API to execute a container (Container.withExec, Container.asService), if the executed tool emits otel telemetry, *dagger will intercept them and emit them in context of dagger's trace. this includes test telemetry :-)
So, the plan is to hook into this otel infrastructure, so that the developer of a parallel dagger function can easily emit custom spans (or, arrange for its test runner to emit custom spans - same result) in such a way that the dagger engine CAN MAP THESE SPANS BACK TO VALUES IN THE LIST ARGUMENT USED FOR FILTERING / SPLITTING. This allows for the loop we talked about. The mapping could be as simple as a standardized otel attribute. Example:
// +parallel
func (p *Go) Test(ctx context.Context, pkgs []string, ...) error {
for _, pkg := range pkgs {
task := dag.Parallel.NewTask().WithArg("pkgs", []string{pkg})
ctx = task.Start(ctx)
_, err := dag.Container().
From("golang:latest").
WithDirectory("/app", p.Source).
WithWorkdir(pkg).
WithExec([]string{"go", "test"}).
Sync(ctx)
if err != nil {
return err
}
task.End(ctx)
}
return nil
}[Understood the telemetry architecture and feedback loop]
Follow-up Questions:
dag.Parallel.NewTask()API - Is this a new core API?- Telemetry Storage - Where does historical timing persist?
- Scheduler Location - Client-side, engine-side, or hybrid?
- First Run (Cold Start) - What happens with no historical data?
- Worker Distribution - Intra-engine or multi-engine?
- Nested Parallelism handling?
Also, in your example, the function still iterates sequentially inside. Is the idea that the engine can optionally split the call itself into N parallel calls?
- don't worry about it. It's a hybrid, for otel-specific reasons.
- We have a cloud service that collects otel + schedules and runs jobs. Locally, the engine will do something simpler.
- Engine-side. CLI stays dumb.
- split evenly / parallelism number is the unknown
- engine decides based on resources. Local-first is a pillar
- Irrelevant. A job is a job
[Discussed configuration options for parallelism number - CLI flag, config file, env var, etc.]
I prefer to avoid function default. Once something is in code, you're stuck with it forever. And we value extreme portability.
For now let's make it not configurable at all. There is a more general "scale-out" feature already in development (see 'dagger check --scale-out') which optionally hooks up to a cloud-hosted scheduler. My thinking is that once +parallel is implemented, this scaleout-system would seamlessly handle parallelism of these functions.
And, when running locally without this scale-out feature, I suppose parallelism would simply be set to 1, since there is no need to ramp it up further - the engine already knows how to run dagger functions concurrently on all available cores. We could simply do parallelism=2, or perhaps another heuristic such as: maximum of 10 "items" per slice.
[Explored scale-out infrastructure - --scale-out flag, CloudEngineClient(), tryScaleOut() in checks.go]
Integration point clear:
+parallelfunctions would use sameCloudEngineClient()pattern- Local mode: simple heuristic
- Scale-out mode: scheduler decides
Implementation Areas:
- Directive parsing
- Schema/metadata
- Return type validation
- Engine splitting logic
- Result merging
- Telemetry API
- Scale-out integration
Questions on return types and multiple list arguments...
- void + directory enough for now
- good question. let's discuss.
- Go toolchain for now. Plus experimental PRs for jest and pytest. Create a Claude skill for adding directives (reduces per-SDK boilerplate); local-only first + stubs for scale-out
- Take inspiration from PR #9327 and util/parallel/
[Explored PR #9327 Status API and util/parallel patterns]
Multiple list arguments discussion - Options: Cartesian, Zipped, First list wins, Explicit +split
Recommendation: Explicit +split annotation
what if +parallel moved to the argument, and no annotation on the function?
That's cleaner. The annotation lives exactly where the semantics are:
func Test(
// +parallel
pkgs []string,
platforms []string,
) errorImplications:
- No function-level annotation
- Self-documenting
- Multiple
+parallelargs = Cartesian (opt-in) - Return type restriction still applies
- Schema representation on
FunctionArg
+parallel on non-list -> error +parallel on nested list -> maybe a solution to the "nested tasks" problem? eg. testing packages + individual tests within each package +parallel combined with +optional -> easy, that's an empty list
- I have one more consideration after you've processed the above
[Discussed nested list [][]string semantics - each inner list is a selector/path]
What's the additional consideration?
Additional consideration: test filtering. Can this parallel facility be also used as a foundation for filtering?
Yes! The +parallel arg is essentially declaring "this is the workload dimension." Once you have that:
- Splitting
- Telemetry
- Filtering
- Listing
Options for listing available values:
- Convention-based companion function
- Explicit annotation
- Single function with mode flag
I don't love any of these options. One other option: no magic facility for the engine to call the companion list function. Instead, the function developer is responsible for calling it:
func (g *Go) TestSpecific(ctx context.Context, tests [][]string // +parallel) error {
// all the logic
}
func (g *Go) TestAll(ctx context.Context) error {
allTests := listAllTests(g.Source)
return g.TestSpecific(allTests)
}This is explicit and composable. The pattern is:
Tests()- List availableTestSpecific(tests // +parallel)- Run specific (parallelizable)TestAll()- Convenience: discover + run all
But there are 2 problems:
TestAll()callsTestSpecific()as native Go call, not Dagger API - parallel won't work. Needs "self-call" feature.- Which should be a
+check? Implications fordagger checkfiltering?
Let's focus on problem 1 first
Options:
- A: Two separate entry points, accept the gap
- B: Single function with optional arg
- C: Engine auto-populates from companion function
- D: Wait for self-call
What if we took a completely different approach. Dagger API supports "fan-out calls" on a list of objects. What if dagger function devs could customize that fan-out behavior, with an implementation that receives an array instead of single receiver?
[Explored batch-aware methods on types]
Options for batch interface:
- Convention: separate batch function
- Directive + context accessor
- Type-level directive
Comparison table of +parallel on arg vs batch method on type
Option C: how does a "batch-eligible method" receive the list of TestCases?
Options:
- Via context accessor
- Optional batch argument (engine-injected)
- Receiver carries batch info
- Two methods, engine picks
What if instead of overloading TestCase to handle a list of itself, we defined a new type that wrapped [TestCase], a sort of "smart list" type which could implement a special interface?
[Explored smart wrapper type with Splittable interface]
type TestSuite struct {
Cases []*TestCase
}
func (s *TestSuite) Items() []*TestCase { return s.Cases }
func (s *TestSuite) WithItems(cases []*TestCase) *TestSuite { ... }Why this is nice:
- TestCase stays simple
- Explicit collection type
- Interface-based
- No self-call needed
The "interface" would be an implicit contract with the dagger engine, not an actual Go interface (because of the nature of the Dagger API). Its only purpose is to wrap a collection type, with the possibility of "intercepting" fan-out calls. The basic tools we have are pragmas.
[Explored pragma-based options]
- Option A: Pragma on collection field
- Option B: Pragma linking wrapper to item type
- Option C: Pragma on methods
The Interception Question: Explicit wrapper construction vs engine auto-wrapping?
Proposal:
type TestFiles struct {
Workspace *dagger.Directory
Files []string // +parallel
}
// +parallel
func (tf *TestFiles) Test(ctx context.Context) error {
_, err = dag.Container().From("jest").WithDirectory(".", tf.Workspace).WithExec([]string{"sh -c '" + strings.Join(tf.Files, " ") + "'").Sync(ctx)
return err
}- Parallel function means: "this function might benefit from parallel execution"
- Parallel field means: "this field can safely be split in buckets for parallel execution"
This is clean. The two pragmas work together:
Engine Behavior:
- Function has
+parallel→ check receiver type TestFiles.Fileshas+parallel→ splittable- Partition
Filesinto N subsets - Create N
TestFilesinstances - Dispatch
Test()on each in parallel - Aggregate results
Questions:
- Multiple
+parallelfields? - Non-parallel methods on same type?
- Telemetry?
- Empty result after split?
- good question. let's explore
- correct. thoughts?
- General idea: any number of otel spans may be emitted with parallel-related attributes for "cost accounting"
- This won't happen because all function calls are scheduled dynamically
1. Multiple +parallel fields:
- Cartesian product?
- Error?
- Explicit grouping?
Current leaning: Start with one +parallel field per type.
2. +parallel field without +parallel function:
Field = capability ("safe to split")
Function = intent ("would benefit")
Both required for engine to parallelize.
3. Telemetry: Fine-grained cost accounting - even when batched, per-item timing captured via span attributes.
Summarize the current state in a markdown, so I can catch up my colleagues on different options we explored
[Generated comprehensive summary document]
Can you paste this to a gist so I can continue mobile? Also can I access any of this from claude web app?
Gist: https://gist.github.com/shykes/5d70658b90cdfb00e426d354f6ab6d8d
Re: Claude web app - This conversation is in Claude Code (CLI), which has separate history from claude.ai. Options:
- Use the gist
- Claude Code on mobile via SSH
- Export full conversation
3
[End of conversation]