Justin Chu justinchuby

## mb_flags.py
        # Mask-specific variables
        # TODO: Reconcile differences between `seqlens_k` and `key_total_seq_lens` in the GroupQueryAttention and SparseAttention implementations. Ideally the same subgraph can be shared for both.
        self.mask_attrs = {
            # "mask_name": "",  # Name of node that outputs 4D causal attention mask (used as add_qk in MultiHeadAttention)
            "seqlens_k": "",  # Sum of each row in attention mask - 1 (used as input to GroupQueryAttention)
            "total_seq_len": "",  # Size of total sequence length in attention mask (used as input to GroupQueryAttention and SparseAttention)
            # "block_row_indices": "",  # Row indices of CSR format of block mask (used as input to SparseAttention)
            # "block_col_indices": "",  # Col indices of CSR format of block mask (used as input to SparseAttention)
            # "key_total_seq_lens": "",  # Sum of each row in attention mask (used as input to SparseAttention)
        }

## qwen3.5-gated-deltanet-analysis.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                justinchuby
                / qwen3.5-gated-deltanet-analysis.md
            
            
              Last active
              February 27, 2026 05:15
            
              
                qwen3.5-gated-deltanet-analysis
              
          
    Qwen3.5 Gated DeltaNet Linear Attention — Technical Analysis

Purpose: Research for proposing new ONNX operators to support Qwen3.5/Qwen3-Next linear attention.
Date: 2026-02-27
Sources:

HuggingFace transformers: src/transformers/models/qwen3_next/modular_qwen3_next.py
HuggingFace transformers: src/transformers/models/qwen3_5/modular_qwen3_5.py
flash-linear-attention library: fla/ops/gated_delta_rule/


## plan.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                justinchuby
                / plan.md
            
            
              Created
              January 28, 2026 00:54
            
              
                Shape inference plan
              
          
    Symbolic Shape Inference for ONNX IR

Problem Statement

Design and implement symbolic shape inference capability for the ONNX IR that can propagate shapes through the graph while preserving symbolic dimension relationships (e.g., batch, seq_len, N+1).
Current State Analysis

Existing Infrastructure


SymbolicDim class (_core.py:1241): Immutable symbolic dimension with string or None values
Shape class (_core.py:1330): Supports mixed static/dynamic dimensions, freezing, denotations


## perch_v2_torch.py
"""
Perch v2 Bird Audio Classification Model - PyTorch Implementation

This module provides a pure PyTorch implementation of the Perch v2 model,
converted from the original ONNX model. The model processes 10-second audio
clips at 16kHz (160,000 samples) and produces bird species embeddings and
classification logits.

Architecture:
- Learned frontend with convolution-based STFT and mel filterbank

## cleanup_perch.py
import onnx_ir as ir
import onnx_ir.passes.common
import onnxscript

m = ir.load("perch_v2_opt3.onnx")

for node in m.graph:
    if node.op_type == "MatMul":
        print(node)
        if node.inputs[0].producer().op_type == "Reshape":

## ort_error.txt
Loading ONNX model from: perch_v2_opt2.onnx
2025-11-04 18:25:33.374 python[72171:12211235] 2025-11-04 18:25:33.374513 [W:onnxruntime:, graph.cc:4885 CleanUnusedInitializersAndNodeArgs] Removing initializer 'arith.constant470'. It is not used by any node and should be removed from the model.
2025-11-04 18:25:33.414 python[72171:12211235] 2025-11-04 18:25:33.414751 [E:onnxruntime:, inference_session.cc:2544 operator()] Exception during initialization: /Users/runner/work/1/s/onnxruntime/core/framework/execution_frame.h:112 const OrtValue &onnxruntime::IExecutionFrame::GetMLValue(int) const ort_value_index >= 0 && static_cast<size_t>(ort_value_index) < all_values_size_ was false.
Traceback (most recent call last):
  File "/Users/justinc/Documents/GitHub/tensorflow-onnx/examples/compare_onnx_tflite.py", line 599, in <module>
    exit(main())
         ^^^^^^
  File "/Users/justinc/Documents/GitHub/tensorflow-onnx/examples/compare_onnx_tflite.py", line 541, in main
    onnx_session = load_onnx_model(args.onnx)


## compare_onnx_tflite.py
#!/usr/bin/env python3
# SPDX-License-Identifier: Apache-2.0

"""
Script to compare the results of an ONNX model with a TFLite model given the same input.
Optionally also compare with Tract runtime for ONNX.
Created by Copilot.

Usage:
    python compare_onnx_tflite.py --onnx model.onnx --tflite model.tflite

## qwen3_onnx.py
# transformers==4.52.4
# pytorch nightly
from __future__ import annotations

import ast
import logging
import typing
from pathlib import Path

import onnx_ir as ir

## qwen3_chat_template.jinja
{%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}

## foundry.log
2025-10-17 17:18:33.408 -07:00 [INF] Command:ModelInit Status:Success Direct:True  Time:0ms
2025-10-17 17:18:33.861 -07:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ListDownloadedModels Status:Success Direct:True Time:5ms
2025-10-17 17:18:33.861 -07:00 [INF] Command:ModelDownload Status:Skipped Direct:False  Time:1698ms
2025-10-17 17:18:33.861 -07:00 [INF] Command:ServiceStart Status:Skipped Direct:False  Time:0ms
2025-10-17 17:18:34.234 -07:00 [INF] Command:ServiceList Status:Success Direct:False  Time:372ms
2025-10-17 17:18:34.234 -07:00 [INF] Loading model: http://127.0.0.1:56051/openai/load/gemma-3-1b-it_ort?ttl=600
2025-10-17 17:18:34.234 -07:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ListLoadedModels Status:Success Direct:True Time:0ms
2025-10-17 17:18:34.685 -07:00 [INF] Loading model:gemma-3-1b-it_ort
2025-10-17 17:18:40.704 -07:00 [INF] Finish loading model:gemma-3-1b-it_ort elapsed time:00:00:06.0190199
2025-10-17 17:18:40.704 -07:00 [INF] UserAgent:foundry-lo
	# Mask-specific variables
	# TODO: Reconcile differences between `seqlens_k` and `key_total_seq_lens` in the GroupQueryAttention and SparseAttention implementations. Ideally the same subgraph can be shared for both.
	self.mask_attrs = {
	# "mask_name": "", # Name of node that outputs 4D causal attention mask (used as add_qk in MultiHeadAttention)
	"seqlens_k": "", # Sum of each row in attention mask - 1 (used as input to GroupQueryAttention)
	"total_seq_len": "", # Size of total sequence length in attention mask (used as input to GroupQueryAttention and SparseAttention)
	# "block_row_indices": "", # Row indices of CSR format of block mask (used as input to SparseAttention)
	# "block_col_indices": "", # Col indices of CSR format of block mask (used as input to SparseAttention)
	# "key_total_seq_lens": "", # Sum of each row in attention mask (used as input to SparseAttention)
	}
	"""
	Perch v2 Bird Audio Classification Model - PyTorch Implementation

	This module provides a pure PyTorch implementation of the Perch v2 model,
	converted from the original ONNX model. The model processes 10-second audio
	clips at 16kHz (160,000 samples) and produces bird species embeddings and
	classification logits.

	Architecture:
	- Learned frontend with convolution-based STFT and mel filterbank
	import onnx_ir as ir
	import onnx_ir.passes.common
	import onnxscript

	m = ir.load("perch_v2_opt3.onnx")

	for node in m.graph:
	if node.op_type == "MatMul":
	print(node)
	if node.inputs[0].producer().op_type == "Reshape":
	Loading ONNX model from: perch_v2_opt2.onnx
	2025-11-04 18:25:33.374 python[72171:12211235] 2025-11-04 18:25:33.374513 [W:onnxruntime:, graph.cc:4885 CleanUnusedInitializersAndNodeArgs] Removing initializer 'arith.constant470'. It is not used by any node and should be removed from the model.
	2025-11-04 18:25:33.414 python[72171:12211235] 2025-11-04 18:25:33.414751 [E:onnxruntime:, inference_session.cc:2544 operator()] Exception during initialization: /Users/runner/work/1/s/onnxruntime/core/framework/execution_frame.h:112 const OrtValue &onnxruntime::IExecutionFrame::GetMLValue(int) const ort_value_index >= 0 && static_cast<size_t>(ort_value_index) < all_values_size_ was false.
	Traceback (most recent call last):
	File "/Users/justinc/Documents/GitHub/tensorflow-onnx/examples/compare_onnx_tflite.py", line 599, in <module>
	exit(main())
	^^^^^^
	File "/Users/justinc/Documents/GitHub/tensorflow-onnx/examples/compare_onnx_tflite.py", line 541, in main
	onnx_session = load_onnx_model(args.onnx)
	#!/usr/bin/env python3
	# SPDX-License-Identifier: Apache-2.0

	"""
	Script to compare the results of an ONNX model with a TFLite model given the same input.
	Optionally also compare with Tract runtime for ONNX.
	Created by Copilot.

	Usage:
	python compare_onnx_tflite.py --onnx model.onnx --tflite model.tflite
	# transformers==4.52.4
	# pytorch nightly
	from __future__ import annotations

	import ast
	import logging
	import typing
	from pathlib import Path

	import onnx_ir as ir
	{%- if tools %}
	{{- '<\|im_start\|>system\n' }}
	{%- if messages[0].role == 'system' %}
	{{- messages[0].content + '\n\n' }}
	{%- endif %}
	{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
	{%- for tool in tools %}
	{{- "\n" }}
	{{- tool \| tojson }}
	{%- endfor %}
	2025-10-17 17:18:33.408 -07:00 [INF] Command:ModelInit Status:Success Direct:True Time:0ms
	2025-10-17 17:18:33.861 -07:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ListDownloadedModels Status:Success Direct:True Time:5ms
	2025-10-17 17:18:33.861 -07:00 [INF] Command:ModelDownload Status:Skipped Direct:False Time:1698ms
	2025-10-17 17:18:33.861 -07:00 [INF] Command:ServiceStart Status:Skipped Direct:False Time:0ms
	2025-10-17 17:18:34.234 -07:00 [INF] Command:ServiceList Status:Success Direct:False Time:372ms
	2025-10-17 17:18:34.234 -07:00 [INF] Loading model: http://127.0.0.1:56051/openai/load/gemma-3-1b-it_ort?ttl=600
	2025-10-17 17:18:34.234 -07:00 [INF] UserAgent:foundry-local-CLI/0.7.120+3b92ed4014 Command:ListLoadedModels Status:Success Direct:True Time:0ms
	2025-10-17 17:18:34.685 -07:00 [INF] Loading model:gemma-3-1b-it_ort
	2025-10-17 17:18:40.704 -07:00 [INF] Finish loading model:gemma-3-1b-it_ort elapsed time:00:00:06.0190199
	2025-10-17 17:18:40.704 -07:00 [INF] UserAgent:foundry-lo