User: So we don't really need the ml_loop in a thread anymore, the main manager thread can just spawn the pooled work directly
GitHub Copilot:
Done.
Updated:
- lib.rs
- Cargo.toml
- README.md
What changed:
- Removed the separate ML dispatcher thread and channel.
- Added a shared
MlRuntimewith:- a
rayon::ThreadPool - the shared compiled cache
- the manager sender
- a
- The manager thread now submits compute/compile work directly to the pool via
MlRuntime. - Removed the old crossbeam dependency.
- Removed the now-unused
zeroed_outputs()helper after the refactor. - Updated the README note to match the new architecture.
Validation:
./mach checkpassed.
Made changes.
User: Remove this MLRuntime abstraction thing. What you need is just for the manager to own a self sender, and a thread pool, as well as the compiled map, and then have a single spawn_work method that takes a function, and then at call point either pass the compile of compute function.
Then the methods of context that maybe need to spawn work should take a reference to the manager.
There is no need to complicate things further with an arc ml runtime thing; all the work on the manager thread happens synchronously in response to receiving a message(you could actually have used not an Arc, but an Rc, but no reference counting is needed at all).
Before making changes, check that my instructions make sense and tell me
GitHub Copilot: Yes, mostly.
The simplification makes sense for lib.rs:
WebNNManagerowning:- the self-sender
- the
rayon::ThreadPool - the shared compiled map
- a single
spawn_work()helper - removing the separate
MlRuntimelayer
That part is sound.
One caveat: “have Context methods take a reference to the manager” is slightly tricky in Rust. In handle_message(), code often does:
- borrow
self.contexts.get_mut(&ctx_id)to get&mut Context - and would then also need
&self/&WebNNManager
That can create a borrow conflict, because the Context lives inside self.
So the design is valid, but that part likely needs one of these shapes instead:
- keep spawning on
WebNNManagermethods, and letContextreturn enough data for the manager to spawn the work, or - pass only the manager-owned work resources into
Contextmethods, not&WebNNManageritself, or - temporarily remove the
Contextfrom the map, operate on it, then insert it back
So: yes, your direction makes sense, with that borrowing caveat.
User: Ok thanks, proceed and follow your own second suggestion with "pass only the manager-owned work resources into..." part
GitHub Copilot:
Done.
Refactored lib.rs to follow the “pass only manager-owned work resources” direction:
- Removed
MlRuntime. WebNNManagernow directly owns:manager_txrayon::ThreadPoolcompiled_map
- Added a single
spawn_work()method onWebNNManager. - Added
WorkItemfor compute/compile jobs. Contextno longer owns any runtime state.Contextmethods now build and return work items:handle_dispatch()get_or_compile()enqueue_or_run()run_now()handle_compute_result()process_queue()
handle_message()now:- mutates the
Context - collects
WorkItems - submits them afterward through
self.spawn_work(...)
- mutates the
Also updated README.md to reflect the manager-owned pool/cache design.
Validation:
./mach checkpassed.
Made changes.
User: Ok so what yo have done can be improved on:
- There will ever only be a single pending work item, so use an option, not a vec.
- There can only be pending work as a result of two types of incoming messages, so remove the logic from all handlers, and remove the vec.
- I think the "run work only if not blocked" is still maintained, but making it dependent the the Vec being empty is extremely hard to read.
- It's weird that WebNNMsg::Compile use two methods (get_or_compile and then enqueue_or_run), wheras for dispatch the queuing is nested to handle_dispatch.
What to do (among other fixes for the above):
Just pass the pool, the compiled map, and the manager tx clone, to the context handler methods that require it, and then store that stuff in the PendingOp variants where work can be spawned. Put an Rc (not Arc) around the thread pool for that.
Before making changes, give me a critique of those changes.
GitHub Copilot:
Critique:
-
Option<WorkItem>instead ofVec<WorkItem>is a good change. The timeline can execute many synchronous ops while draining, but it can yield at most one async item beforequeue_blockedflips back totrue. The currentVecshape obscures that invariant. -
Removing work-item plumbing from handlers that can never produce async work is also good. Right now the repeated “collect vec, then spawn” pattern adds noise to every arm.
-
Unifying
DispatchandCompilehandling is also good. The currentCompilepath is visibly split acrossget_or_compile()andenqueue_or_run(), whileDispatchalready hides that sequencing. That inconsistency makes the control flow harder to follow. -
The
Vec<WorkItem>emptiness check is not a good readability boundary. It encodes control flow indirectly. An explicit “did this produce pending work?” return is clearer.
Main concern with the proposed redesign:
- Passing the pool, compiled map, and sender into
Contextmethods is reasonable. - Storing them inside
PendingOpis the part that is weaker.
Why that part is weaker:
PendingOpis currently a pure description of deferred timeline work. If it starts carrying runtime handles, it becomes partly an execution object.- That makes queued entries heavier and less declarative.
- It also duplicates cloned runtime handles into the queue, even though those resources are manager-owned and stable.
- It mixes “what operation is pending” with “how to execute it”.
On Rc specifically:
Rc<rayon::ThreadPool>is fine only because it stays on the manager thread and queued ops are not sent across threads.- It should not be used inside the worker closure itself.
- Still, the thread pool does not need ownership in queued ops. A borrowed execution context at replay time is cleaner than embedding
Rcinto eachPendingOp.
Best refinement of your idea:
- Use an
Option<PendingWork>/Option<WorkItem>return. - Restrict async-producing logic to the two relevant paths.
- Make
run_now()/process_queue()take a small execution context parameter containing the pool, compiled map, and sender. - Keep
PendingOpdata-only; do not store runtime handles in it.
That keeps the clearer control flow you want, without coupling timeline state to manager execution resources.
So: most of the requested direction is an improvement, but embedding runtime handles into PendingOp is the part that should be avoided.
User: Make run_now() / process_queue() take a small execution context parameter containing the pool, compiled map, and sender.
That;s good, and basically what I said at the very beginning I think, no?
GitHub Copilot: Yes.
The main difference is just one boundary choice:
- your original direction: pass the execution resources into the context-side handling path
- the refinement: pass them into
run_now()/process_queue()explicitly, instead of storing them insidePendingOp
So the core idea was yours: keep the manager-owned execution resources out of persistent manager state on Context, and thread them through the places that may spawn work.
The part I was pushing back on was embedding those handles into PendingOp. Passing a small execution context into run_now() / process_queue() keeps the same architecture, but cleaner.
User: At the start I wrote: Then the methods of context that maybe need to spawn work should take a reference to the manager.
Which is indeed not the same thing, so we want sort of full circle now with a better outcome.
Confirm
GitHub Copilot: Confirmed.
Your original statement was broader: context methods that may spawn work should take access to manager-owned execution state.
The better final form is the narrower version of that same idea:
- not
&WebNNManagerdirectly - not storing runtime handles inside
Context - not storing runtime handles inside
PendingOp - instead, pass a small execution context into the context-side methods that may reach spawning points, especially
run_now()andprocess_queue()
So yes: this is effectively full circle, but with a cleaner boundary than passing the whole manager.
User: Ok proceed
GitHub Copilot:
Done.
Changes:
- Replaced the manager/context
Vec<WorkItem>plumbing with an execution context passed through queue-draining code in lib.rs. run_now()andprocess_queue()now take that execution context and spawn pooled work directly from the context timeline in lib.rs.- Unified compile handling so
Compilenow queues a fullPendingOp::Compile(ctx_id, graph_id, graph_info)instead of splitting work creation across two paths in lib.rs and lib.rs. - Moved the pool into
Rc<ThreadPool>and introduced a narrowExecutionCtxwrapper in lib.rs. - Updated the component note in README.md.
Validation:
./mach checkpassed.
Made changes.
User: So I don't like how still this execution context spills into many places where it is never required.
So, I think for this we need to simply always store an execution context on each context, or actually, just remove execution context, and add its member directly to context. Then the context can just use the stuff it needs at the lowest level, leaving other paths unburdened.
GitHub Copilot:
Done.
What changed:
- Removed
ExecutionCtxentirely from lib.rs. - Added
pool,compiled_map, andmanager_txdirectly toContextin lib.rs. Context::new()now receives cloned manager-owned runtime handles when a context is created in lib.rs and lib.rs.run_now()andprocess_queue()no longer take an execution parameter; they useself.spawn_work()internally at the actual async boundary in lib.rs.- Manager message handling no longer threads execution state through unrelated paths in lib.rs.
Documentation:
- Updated the design note in README.md.
Validation:
./mach checkpassed.
Made changes.
User: so what's your take on the latest change I requested, in the light of the road travelled so far?
GitHub Copilot: It is the better endpoint.
In light of the path so far:
- The earlier
WorkItem-return design was correct but too indirect. - The
ExecutionCtxversion fixed that indirection, but spread an execution-only parameter through many methods that do not conceptually care about execution. - The latest change moves that burden down to the true execution boundary, which is what the code wanted all along.
Why this version is stronger:
Contextalready owns the timeline state, so letting it also own the resources needed to advance that timeline is coherent.run_now()andprocess_queue()are simpler again.DispatchandCompileare now structurally aligned.- The manager match arms are lighter and easier to read.
- The “spawn only at the lowest level” rule is now reflected directly in the code.
Tradeoff:
Contextnow contains some manager-owned runtime handles, so it is less purely “state-only” than before.
But given this codebase and this control flow, that tradeoff is worth it. The handles are stable, cheap to clone, and directly relevant to timeline advancement. So compared with the previous steps, this change is cleaner, more local, and closer to the model you were pushing toward from the start.