Skip to content

Instantly share code, notes, and snippets.

@gterzian
Created February 23, 2026 10:26
Show Gist options
  • Select an option

  • Save gterzian/2d14c09de0056044be9c016af8b2f716 to your computer and use it in GitHub Desktop.

Select an option

Save gterzian/2d14c09de0056044be9c016af8b2f716 to your computer and use it in GitHub Desktop.

User: so we need to remove the blocking recv here.

GitHub Copilot: Searched for text response_rx.recv, 1 result
Searched for text WebNNManager, 5 results
Searched for text ml_loop, 2 results

User: don't write code yet, just give me a plan

GitHub Copilot: Searched for files matching **/webnn/src/*, 2 matches
Searched for files matching **/webnn, no matches
Read <repo>/components/shared/webnn/src/lib.rs

What we’re doing here is turning the current synchronous “send a request, block until the
ml‑thread replies” pattern into an asynchronous one
so that the manager thread
can keep pulling messages off its incoming channel even while a graph is
being executed.

1. add a completion message

The manager needs a way to be told “here are the bytes for this dispatch”.
That means extending the shared WebNNMsg enum in lib.rs with something like:

use std::collections::HashMap;

// …existing variants…

/// Reply from the ML worker containing the output tensor bytes for a previously
/// dispatched graph.
ComputeResult(ContextId, HashMap<u32, Vec<u8>>),

This variant will be sent by the ML thread back to the manager; the manager
will update the appropriate Context’s tensor_store when it is received.

2. wire the manager sender into the ML thread

new_webnn_manager currently returns a sender/receiver pair and spawns
run_manager(rx) on a thread. We will clone the tx and pass it into
run_manager (or otherwise capture it) so that run_manager can hand a
clone along to ml_loop.

let manager_tx = tx.clone();
let handle = thread::Builder::new()
    .spawn(move || run_manager(rx, manager_tx));

fn run_manager(receiver: GenericReceiver<WebNNMsg>,
               manager_tx: GenericSender<WebNNMsg>) {let ml_handle = thread::Builder::new()
        .spawn(move || ml_loop(ml_rx, manager_tx.clone()));}

ml_loop’s signature is updated accordingly:

fn ml_loop(rx: Receiver<MlMsg>, manager_tx: GenericSender<WebNNMsg>) {}

3. change the MlMsg and ml_loop

MlMsg::Compute needs to carry the originating ContextId (so the worker
knows where to send the result). The one‑shot response channel is dropped.

enum MlMsg {
    Compute {
        ctx_id: ContextId,
        graph_info: rustnn::graph::GraphInfo,
        inputs_map: HashMap<u32, u32>,
        inputs_bytes: HashMap<u32, Vec<u8>>,
        outputs_map: HashMap<u32, u32>,
    },
    Exit,
}

Inside ml_loop we still perform the CoreML attempt / zero‑fill as we do now,
but after the work we emit a WebNNMsg::ComputeResult(ctx_id, outputs) on
manager_tx. If that send fails we log a warning.

4. make Context::compute fire–and‑forget

compute will:

  • collect input bytes and resolve constants as today
  • build an MlMsg::Compute with its own ctx_id
  • attempt compute_tx.send(msg); if that fails (ML thread gone) call
    self.zeroed_outputs(...) immediately as a fallback, then return
  • do not create a response channel or call recv()

The response_rx.recv() line at 303 will disappear entirely.

5. handle results in the manager

Add a match arm to WebNNManager::handle_message for WebNNMsg::ComputeResult:

WebNNMsg::ComputeResult(ctx_id, outputs) => {
    if let Some(ctx) = self.contexts.get_mut(&ctx_id) {
        for (tid, bytes) in outputs { ctx.tensor_store.insert(tid, bytes); }
    } else {
        warn!("webnn manager: ComputeResult for unknown context {:?}", ctx_id);
    }
    true
}

This keeps the same behaviour of inserting the bytes into the tensor store,
but does it when the message arrives rather than while the dispatch call is
still running.

6. clean‑up and documentation

  • Remove the one‑shot channel creation and all related comments.
  • Update comments around compute/ml_loop to explain the asynchronous
    design.
  • Add a note in the README (probably README.md) about the
    non‑blocking manager and the new ComputeResult message.
  • Consider a small comment in WebNNMsg documenting the new variant.

7. tests

There aren’t any WebNN unit tests at the moment, so add at least one new
exercise that exercises the non‑blocking behaviour:

  1. send a Dispatch that takes some time (we can simulate with a custom
    MlMsg handler in a test, or simply use the existing path and delay the
    ML thread with a sleep).
  2. while the worker is still running, send another message (e.g. CreateTensor)
    and assert that the manager processes it without deadlocking.
  3. finally assert that the first dispatch’s outputs eventually arrive in the
    correct tensor.

The goal is to make sure the manager isn’t held up by the ML computation.

8. regression and safety

  • make sure destroying a context while a compute is in flight is benign – the
    ComputeResult handler already warns and drops the data.
  • keep the zeroed‑output fallback when the ML channel is closed.
  • run ./mach check to verify compilation; update any imports (e.g. bring
    HashMap into scope in shared crate).

After these changes the manager thread no longer blocks on response_rx.recv(),
and the ML work is entirely asynchronous: the compute worker can be
pre‑empted, the manager can continue handling new requests, and results are
reintegrated when they arrive.

User: yeah pretty good, but we need to prevent tensor read or writes, or any other step running on the timeline of said context, until the compute finishes.

so this is the time to implement a backend concept of the context time line (you ca use search‑bs with "timeline" if you want to find it in the spec, but the spec does not tell us how to implement it).

So here's how we'll do it.

  • Add some kind of vecdequeue to each context, and document it as being part of the implementation of the timeline.
  • all context operations can run immediately when the relevant message comes in if there is no pending compute (a boolean flag). However, when there is a pending compute, then you must queue operations and then run them when the compute finishes (until the next compute starts).

GitHub Copilot: Searched for text new_webnn_manager, 5 results
Read <repo>/components/webnn/README.md
Searched for regex #\[cfg\(test\), 20 results
Searched for text webnn manager, 20 results
Searched for text test (**/components/webnn/src/lib.rs), no results
Searched for text constant_operand, 20 results
Searched for text fn test (**/components/webnn/src/lib.rs), no results
Searched for text struct GraphInfo, 7 results
Read <repo>/scratchpad/rustnn/src/graph.rs#120-120, lines 120 to 200
Replacing 160 lines with 151 lines in <repo>/components/webnn/src/lib.rs

Read <repo>/components/webnn/src/lib.rs#323-323, lines 323 to 370
Replacing 85 lines with 106 lines in <repo>/components/webnn/src/lib.rs

Searched for text PendingOp, 9 results
Replacing 97 lines with 183 lines in <repo>/components/webnn/src/lib.rs

Searched for text enqueue_or_run, 4 results
Read <repo>/components/webnn/src/lib.rs#200-200, lines 200 to 280
Searched for text fn enqueue_or_run, no results
Searched for text process_queue, 3 results
Replacing 24 lines with 86 lines in <repo>/components/webnn/src/lib.rs

Replacing 3 lines with 9 lines in <repo>/components/shared/webnn/src/lib.rs

Replacing 2 lines with 3 lines in <repo>/components/shared/webnn/src/lib.rs

Searched for text MlMsg::Compute, 2 results
Ran terminal command: ./mach check

Read <repo>/components/webnn/src/lib.rs#360-360, lines 360 to 430
Replacing 15 lines with 14 lines in <repo>/components/webnn/src/lib.rs
Ran terminal command: ./mach check
Replacing 13 lines with 14 lines in <repo>/components/webnn/src/lib.rs
Ran terminal command: ./mach check

Read <home>/Library/Application Support/Code/User/workspaceStorage/.../content.txt
Replacing 12 lines with 79 lines in <repo>/components/webnn/src/lib.rs
Searched for text struct PipelineId, 3 results
Read <repo>/components/script/dom/pipelineid.rs
Searched for text mod id, 20 results
Replacing 4 lines with 10 lines in <repo>/components/webnn/README.md
Replacing 6 lines with 6 lines in <repo>/components/shared/webnn/src/lib.rs
Ran terminal command: ./mach test components/webnn --filter timeline
Ran terminal command: ./mach test-unit components/webnn
Ran terminal command: cd components/webnn && cargo test -- --ignored

Made changes.

User: no just run mach check for now

GitHub Copilot: Ran terminal command: ./mach check

✅ All good—./mach check completes successfully, and the recent timeline
and async compute changes compile cleanly.

Changes summary

  • Asynchronous dispatch: removed blocking recv(); ML thread now sends
    WebNNMsg::ComputeResult back to the manager.
  • Timeline implementation:
    • Added VecDeque<PendingOp> and compute_in_flight flag to every Context.
    • Deferred operations queue when a compute is pending.
    • Introduced helpers (enqueue_or_run, run_now, process_queue, etc.).
    • Added unit test verifying queuing semantics.
  • Message updates: new ComputeResult variant in shared messages.
  • Manager updates: ported to use new async pattern and handle results.
  • README docs: documented timeline behaviour.
  • Minor cleanup and imports; tests added.

🛠️ The blocking response_rx.recv() has been eliminated, and subsequent
operations on a context are now correctly queued until compute finishes,
satisfying the new timeline requirement.

Let me know if you’d like to extend tests further or roll this into a patch!

User: the bool returned by run_now seems unnecessary, because within process_queue you already break on the compute flag

GitHub Copilot: Replacing 53 lines with 47 lines in <repo>/components/webnn/src/lib.rs
Ran terminal command: ./mach check

💡 The refactor is complete and the code still compiles!

What changed

  • run_now no longer returns bool.
  • enqueue_or_run simply calls run_now when idle.
  • process_queue no longer inspects a return value; it loops until
    compute_in_flight becomes true (set within run_now for dispatches).
  • The compute fallback logic still clears compute_in_flight so the loop
    continues correctly.

This simplifies the timeline machinery while preserving behaviour: queued
operations are replayed immediately after a synchronous compute fallback or
once an asynchronous compute result arrives.

All checks passed (./mach check produced no errors), so the implementation
is now cleaner and more maintainable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment