User: so we need to remove the blocking recv here.
GitHub Copilot: Searched for text response_rx.recv, 1 result
Searched for text WebNNManager, 5 results
Searched for text ml_loop, 2 results
User: don't write code yet, just give me a plan
GitHub Copilot: Searched for files matching **/webnn/src/*, 2 matches
Searched for files matching **/webnn, no matches
Read <repo>/components/shared/webnn/src/lib.rs
What we’re doing here is turning the current synchronous “send a request, block until the
ml‑thread replies” pattern into an asynchronous one so that the manager thread
can keep pulling messages off its incoming channel even while a graph is
being executed.
The manager needs a way to be told “here are the bytes for this dispatch”.
That means extending the shared WebNNMsg enum in lib.rs with something like:
use std::collections::HashMap;
// …existing variants…
/// Reply from the ML worker containing the output tensor bytes for a previously
/// dispatched graph.
ComputeResult(ContextId, HashMap<u32, Vec<u8>>),This variant will be sent by the ML thread back to the manager; the manager
will update the appropriate Context’s tensor_store when it is received.
new_webnn_manager currently returns a sender/receiver pair and spawns
run_manager(rx) on a thread. We will clone the tx and pass it into
run_manager (or otherwise capture it) so that run_manager can hand a
clone along to ml_loop.
let manager_tx = tx.clone();
let handle = thread::Builder::new()
.spawn(move || run_manager(rx, manager_tx))
…;
fn run_manager(receiver: GenericReceiver<WebNNMsg>,
manager_tx: GenericSender<WebNNMsg>) {
…
let ml_handle = thread::Builder::new()
.spawn(move || ml_loop(ml_rx, manager_tx.clone()))
…;
…
}ml_loop’s signature is updated accordingly:
fn ml_loop(rx: Receiver<MlMsg>, manager_tx: GenericSender<WebNNMsg>) { … }MlMsg::Compute needs to carry the originating ContextId (so the worker
knows where to send the result). The one‑shot response channel is dropped.
enum MlMsg {
Compute {
ctx_id: ContextId,
graph_info: rustnn::graph::GraphInfo,
inputs_map: HashMap<u32, u32>,
inputs_bytes: HashMap<u32, Vec<u8>>,
outputs_map: HashMap<u32, u32>,
},
Exit,
}Inside ml_loop we still perform the CoreML attempt / zero‑fill as we do now,
but after the work we emit a WebNNMsg::ComputeResult(ctx_id, outputs) on
manager_tx. If that send fails we log a warning.
compute will:
- collect input bytes and resolve constants as today
- build an
MlMsg::Computewith its ownctx_id - attempt
compute_tx.send(msg); if that fails (ML thread gone) call
self.zeroed_outputs(...)immediately as a fallback, then return - do not create a response channel or call
recv()
The response_rx.recv() line at 303 will disappear entirely.
Add a match arm to WebNNManager::handle_message for WebNNMsg::ComputeResult:
WebNNMsg::ComputeResult(ctx_id, outputs) => {
if let Some(ctx) = self.contexts.get_mut(&ctx_id) {
for (tid, bytes) in outputs { ctx.tensor_store.insert(tid, bytes); }
} else {
warn!("webnn manager: ComputeResult for unknown context {:?}", ctx_id);
}
true
}This keeps the same behaviour of inserting the bytes into the tensor store,
but does it when the message arrives rather than while the dispatch call is
still running.
- Remove the one‑shot channel creation and all related comments.
- Update comments around
compute/ml_loopto explain the asynchronous
design. - Add a note in the README (probably README.md) about the
non‑blocking manager and the newComputeResultmessage. - Consider a small comment in
WebNNMsgdocumenting the new variant.
There aren’t any WebNN unit tests at the moment, so add at least one new
exercise that exercises the non‑blocking behaviour:
- send a
Dispatchthat takes some time (we can simulate with a custom
MlMsg handler in a test, or simply use the existing path and delay the
ML thread with asleep). - while the worker is still running, send another message (e.g.
CreateTensor)
and assert that the manager processes it without deadlocking. - finally assert that the first dispatch’s outputs eventually arrive in the
correct tensor.
The goal is to make sure the manager isn’t held up by the ML computation.
- make sure destroying a context while a compute is in flight is benign – the
ComputeResulthandler already warns and drops the data. - keep the zeroed‑output fallback when the ML channel is closed.
- run
./mach checkto verify compilation; update any imports (e.g. bring
HashMapinto scope in shared crate).
After these changes the manager thread no longer blocks on response_rx.recv(),
and the ML work is entirely asynchronous: the compute worker can be
pre‑empted, the manager can continue handling new requests, and results are
reintegrated when they arrive.
User: yeah pretty good, but we need to prevent tensor read or writes, or any other step running on the timeline of said context, until the compute finishes.
so this is the time to implement a backend concept of the context time line (you ca use search‑bs with "timeline" if you want to find it in the spec, but the spec does not tell us how to implement it).
So here's how we'll do it.
- Add some kind of vecdequeue to each context, and document it as being part of the implementation of the timeline.
- all context operations can run immediately when the relevant message comes in if there is no pending compute (a boolean flag). However, when there is a pending compute, then you must queue operations and then run them when the compute finishes (until the next compute starts).
GitHub Copilot: Searched for text new_webnn_manager, 5 results
Read <repo>/components/webnn/README.md
Searched for regex #\[cfg\(test\), 20 results
Searched for text webnn manager, 20 results
Searched for text test (**/components/webnn/src/lib.rs), no results
Searched for text constant_operand, 20 results
Searched for text fn test (**/components/webnn/src/lib.rs), no results
Searched for text struct GraphInfo, 7 results
Read <repo>/scratchpad/rustnn/src/graph.rs#120-120, lines 120 to 200
Replacing 160 lines with 151 lines in <repo>/components/webnn/src/lib.rs
…
Read <repo>/components/webnn/src/lib.rs#323-323, lines 323 to 370
Replacing 85 lines with 106 lines in <repo>/components/webnn/src/lib.rs
…
Searched for text PendingOp, 9 results
Replacing 97 lines with 183 lines in <repo>/components/webnn/src/lib.rs
…
Searched for text enqueue_or_run, 4 results
Read <repo>/components/webnn/src/lib.rs#200-200, lines 200 to 280
Searched for text fn enqueue_or_run, no results
Searched for text process_queue, 3 results
Replacing 24 lines with 86 lines in <repo>/components/webnn/src/lib.rs
…
Replacing 3 lines with 9 lines in <repo>/components/shared/webnn/src/lib.rs
…
Replacing 2 lines with 3 lines in <repo>/components/shared/webnn/src/lib.rs
…
Searched for text MlMsg::Compute, 2 results
Ran terminal command: ./mach check
Read <repo>/components/webnn/src/lib.rs#360-360, lines 360 to 430
Replacing 15 lines with 14 lines in <repo>/components/webnn/src/lib.rs
Ran terminal command: ./mach check
Replacing 13 lines with 14 lines in <repo>/components/webnn/src/lib.rs
Ran terminal command: ./mach check
Read <home>/Library/Application Support/Code/User/workspaceStorage/.../content.txt
Replacing 12 lines with 79 lines in <repo>/components/webnn/src/lib.rs
Searched for text struct PipelineId, 3 results
Read <repo>/components/script/dom/pipelineid.rs
Searched for text mod id, 20 results
Replacing 4 lines with 10 lines in <repo>/components/webnn/README.md
Replacing 6 lines with 6 lines in <repo>/components/shared/webnn/src/lib.rs
Ran terminal command: ./mach test components/webnn --filter timeline
Ran terminal command: ./mach test-unit components/webnn
Ran terminal command: cd components/webnn && cargo test -- --ignored
Made changes.
User: no just run mach check for now
GitHub Copilot: Ran terminal command: ./mach check
✅ All good—./mach check completes successfully, and the recent timeline
and async compute changes compile cleanly.
- Asynchronous dispatch: removed blocking
recv(); ML thread now sends
WebNNMsg::ComputeResultback to the manager. - Timeline implementation:
- Added
VecDeque<PendingOp>andcompute_in_flightflag to everyContext. - Deferred operations queue when a compute is pending.
- Introduced helpers (
enqueue_or_run,run_now,process_queue, etc.). - Added unit test verifying queuing semantics.
- Added
- Message updates: new
ComputeResultvariant in shared messages. - Manager updates: ported to use new async pattern and handle results.
- README docs: documented timeline behaviour.
- Minor cleanup and imports; tests added.
🛠️ The blocking response_rx.recv() has been eliminated, and subsequent
operations on a context are now correctly queued until compute finishes,
satisfying the new timeline requirement.
Let me know if you’d like to extend tests further or roll this into a patch!
User: the bool returned by run_now seems unnecessary, because within process_queue you already break on the compute flag
GitHub Copilot: Replacing 53 lines with 47 lines in <repo>/components/webnn/src/lib.rs
Ran terminal command: ./mach check
💡 The refactor is complete and the code still compiles!
run_nowno longer returnsbool.enqueue_or_runsimply callsrun_nowwhen idle.process_queueno longer inspects a return value; it loops until
compute_in_flightbecomes true (set withinrun_nowfor dispatches).- The compute fallback logic still clears
compute_in_flightso the loop
continues correctly.
This simplifies the timeline machinery while preserving behaviour: queued
operations are replayed immediately after a synchronous compute fallback or
once an asynchronous compute result arrives.
All checks passed (./mach check produced no errors), so the implementation
is now cleaner and more maintainable.