Skip to content

Instantly share code, notes, and snippets.

@gterzian
Created March 10, 2026 16:56
Show Gist options
  • Select an option

  • Save gterzian/ad410218377e6acce137f8c9e26bc39d to your computer and use it in GitHub Desktop.

Select an option

Save gterzian/ad410218377e6acce137f8c9e26bc39d to your computer and use it in GitHub Desktop.

User: So we don't really need the ml_loop in a thread anymore, the main manager thread can just spawn the pooled work directly

GitHub Copilot:

Done.

Updated:

  • lib.rs
  • Cargo.toml
  • README.md

What changed:

  • Removed the separate ML dispatcher thread and channel.
  • Added a shared MlRuntime with:
    • a rayon::ThreadPool
    • the shared compiled cache
    • the manager sender
  • The manager thread now submits compute/compile work directly to the pool via MlRuntime.
  • Removed the old crossbeam dependency.
  • Removed the now-unused zeroed_outputs() helper after the refactor.
  • Updated the README note to match the new architecture.

Validation:

  • ./mach check passed.

Made changes.

User: Remove this MLRuntime abstraction thing. What you need is just for the manager to own a self sender, and a thread pool, as well as the compiled map, and then have a single spawn_work method that takes a function, and then at call point either pass the compile of compute function.

Then the methods of context that maybe need to spawn work should take a reference to the manager.

There is no need to complicate things further with an arc ml runtime thing; all the work on the manager thread happens synchronously in response to receiving a message(you could actually have used not an Arc, but an Rc, but no reference counting is needed at all).

Before making changes, check that my instructions make sense and tell me

GitHub Copilot: Yes, mostly.

The simplification makes sense for lib.rs:

  • WebNNManager owning:
    • the self-sender
    • the rayon::ThreadPool
    • the shared compiled map
  • a single spawn_work() helper
  • removing the separate MlRuntime layer

That part is sound.

One caveat: “have Context methods take a reference to the manager” is slightly tricky in Rust. In handle_message(), code often does:

  • borrow self.contexts.get_mut(&ctx_id) to get &mut Context
  • and would then also need &self / &WebNNManager

That can create a borrow conflict, because the Context lives inside self.

So the design is valid, but that part likely needs one of these shapes instead:

  • keep spawning on WebNNManager methods, and let Context return enough data for the manager to spawn the work, or
  • pass only the manager-owned work resources into Context methods, not &WebNNManager itself, or
  • temporarily remove the Context from the map, operate on it, then insert it back

So: yes, your direction makes sense, with that borrowing caveat.

User: Ok thanks, proceed and follow your own second suggestion with "pass only the manager-owned work resources into..." part

GitHub Copilot:

Done.

Refactored lib.rs to follow the “pass only manager-owned work resources” direction:

  • Removed MlRuntime.
  • WebNNManager now directly owns:
    • manager_tx
    • rayon::ThreadPool
    • compiled_map
  • Added a single spawn_work() method on WebNNManager.
  • Added WorkItem for compute/compile jobs.
  • Context no longer owns any runtime state.
  • Context methods now build and return work items:
    • handle_dispatch()
    • get_or_compile()
    • enqueue_or_run()
    • run_now()
    • handle_compute_result()
    • process_queue()
  • handle_message() now:
    • mutates the Context
    • collects WorkItems
    • submits them afterward through self.spawn_work(...)

Also updated README.md to reflect the manager-owned pool/cache design.

Validation:

  • ./mach check passed.

Made changes.

User: Ok so what yo have done can be improved on:

  • There will ever only be a single pending work item, so use an option, not a vec.
  • There can only be pending work as a result of two types of incoming messages, so remove the logic from all handlers, and remove the vec.
  • I think the "run work only if not blocked" is still maintained, but making it dependent the the Vec being empty is extremely hard to read.
  • It's weird that WebNNMsg::Compile use two methods (get_or_compile and then enqueue_or_run), wheras for dispatch the queuing is nested to handle_dispatch.

What to do (among other fixes for the above):

Just pass the pool, the compiled map, and the manager tx clone, to the context handler methods that require it, and then store that stuff in the PendingOp variants where work can be spawned. Put an Rc (not Arc) around the thread pool for that.

Before making changes, give me a critique of those changes.

GitHub Copilot:

Critique:

  • Option<WorkItem> instead of Vec<WorkItem> is a good change. The timeline can execute many synchronous ops while draining, but it can yield at most one async item before queue_blocked flips back to true. The current Vec shape obscures that invariant.

  • Removing work-item plumbing from handlers that can never produce async work is also good. Right now the repeated “collect vec, then spawn” pattern adds noise to every arm.

  • Unifying Dispatch and Compile handling is also good. The current Compile path is visibly split across get_or_compile() and enqueue_or_run(), while Dispatch already hides that sequencing. That inconsistency makes the control flow harder to follow.

  • The Vec<WorkItem> emptiness check is not a good readability boundary. It encodes control flow indirectly. An explicit “did this produce pending work?” return is clearer.

Main concern with the proposed redesign:

  • Passing the pool, compiled map, and sender into Context methods is reasonable.
  • Storing them inside PendingOp is the part that is weaker.

Why that part is weaker:

  • PendingOp is currently a pure description of deferred timeline work. If it starts carrying runtime handles, it becomes partly an execution object.
  • That makes queued entries heavier and less declarative.
  • It also duplicates cloned runtime handles into the queue, even though those resources are manager-owned and stable.
  • It mixes “what operation is pending” with “how to execute it”.

On Rc specifically:

  • Rc<rayon::ThreadPool> is fine only because it stays on the manager thread and queued ops are not sent across threads.
  • It should not be used inside the worker closure itself.
  • Still, the thread pool does not need ownership in queued ops. A borrowed execution context at replay time is cleaner than embedding Rc into each PendingOp.

Best refinement of your idea:

  • Use an Option<PendingWork> / Option<WorkItem> return.
  • Restrict async-producing logic to the two relevant paths.
  • Make run_now() / process_queue() take a small execution context parameter containing the pool, compiled map, and sender.
  • Keep PendingOp data-only; do not store runtime handles in it.

That keeps the clearer control flow you want, without coupling timeline state to manager execution resources.

So: most of the requested direction is an improvement, but embedding runtime handles into PendingOp is the part that should be avoided.

User: Make run_now() / process_queue() take a small execution context parameter containing the pool, compiled map, and sender.

That;s good, and basically what I said at the very beginning I think, no?

GitHub Copilot: Yes.

The main difference is just one boundary choice:

  • your original direction: pass the execution resources into the context-side handling path
  • the refinement: pass them into run_now() / process_queue() explicitly, instead of storing them inside PendingOp

So the core idea was yours: keep the manager-owned execution resources out of persistent manager state on Context, and thread them through the places that may spawn work.

The part I was pushing back on was embedding those handles into PendingOp. Passing a small execution context into run_now() / process_queue() keeps the same architecture, but cleaner.

User: At the start I wrote: Then the methods of context that maybe need to spawn work should take a reference to the manager.

Which is indeed not the same thing, so we want sort of full circle now with a better outcome.

Confirm

GitHub Copilot: Confirmed.

Your original statement was broader: context methods that may spawn work should take access to manager-owned execution state.

The better final form is the narrower version of that same idea:

  • not &WebNNManager directly
  • not storing runtime handles inside Context
  • not storing runtime handles inside PendingOp
  • instead, pass a small execution context into the context-side methods that may reach spawning points, especially run_now() and process_queue()

So yes: this is effectively full circle, but with a cleaner boundary than passing the whole manager.

User: Ok proceed

GitHub Copilot:

Done.

Changes:

  • Replaced the manager/context Vec<WorkItem> plumbing with an execution context passed through queue-draining code in lib.rs.
  • run_now() and process_queue() now take that execution context and spawn pooled work directly from the context timeline in lib.rs.
  • Unified compile handling so Compile now queues a full PendingOp::Compile(ctx_id, graph_id, graph_info) instead of splitting work creation across two paths in lib.rs and lib.rs.
  • Moved the pool into Rc<ThreadPool> and introduced a narrow ExecutionCtx wrapper in lib.rs.
  • Updated the component note in README.md.

Validation:

  • ./mach check passed.

Made changes.

User: So I don't like how still this execution context spills into many places where it is never required.

So, I think for this we need to simply always store an execution context on each context, or actually, just remove execution context, and add its member directly to context. Then the context can just use the stuff it needs at the lowest level, leaving other paths unburdened.

GitHub Copilot:

Done.

What changed:

  • Removed ExecutionCtx entirely from lib.rs.
  • Added pool, compiled_map, and manager_tx directly to Context in lib.rs.
  • Context::new() now receives cloned manager-owned runtime handles when a context is created in lib.rs and lib.rs.
  • run_now() and process_queue() no longer take an execution parameter; they use self.spawn_work() internally at the actual async boundary in lib.rs.
  • Manager message handling no longer threads execution state through unrelated paths in lib.rs.

Documentation:

  • Updated the design note in README.md.

Validation:

  • ./mach check passed.

Made changes.

User: so what's your take on the latest change I requested, in the light of the road travelled so far?

GitHub Copilot: It is the better endpoint.

In light of the path so far:

  • The earlier WorkItem-return design was correct but too indirect.
  • The ExecutionCtx version fixed that indirection, but spread an execution-only parameter through many methods that do not conceptually care about execution.
  • The latest change moves that burden down to the true execution boundary, which is what the code wanted all along.

Why this version is stronger:

  • Context already owns the timeline state, so letting it also own the resources needed to advance that timeline is coherent.
  • run_now() and process_queue() are simpler again.
  • Dispatch and Compile are now structurally aligned.
  • The manager match arms are lighter and easier to read.
  • The “spawn only at the lowest level” rule is now reflected directly in the code.

Tradeoff:

  • Context now contains some manager-owned runtime handles, so it is less purely “state-only” than before.

But given this codebase and this control flow, that tradeoff is worth it. The handles are stable, cheap to clone, and directly relevant to timeline advancement. So compared with the previous steps, this change is cleaner, more local, and closer to the model you were pushing toward from the start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment