Split of torch-2.10.0+rocm7.1-cp313-cp313-manylinux_2_28_x86_64.whl using rocm_kpack.tools.split_python_wheels.
| Metric | Value |
|---|---|
| Input wheel (.whl) | 5.1 GB |
| Host wheel (.whl) | 431 MB |
Source: CI ASAN Run #21463906609
Artifacts Index: https://therock-ci-artifacts.s3.amazonaws.com/21463906609-linux/index-gfx94X-dcgpu-asan.html
From TheRock repo root:
| #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> | |
| #map1 = affine_map<(d0, d1) -> (d1, d0)> | |
| #map2 = affine_map<(d0, d1) -> (d0, d1)> | |
| #map3 = affine_map<(d0, d1) -> (d1)> | |
| #map4 = affine_map<(d0, d1, d2) -> (d2, d0)> | |
| #map5 = affine_map<(d0, d1, d2) -> (d1, d2)> | |
| #map6 = affine_map<(d0, d1, d2) -> (d0, d2)> | |
| #map7 = affine_map<(d0, d1, d2) -> (d2, d0, d1)> | |
| #map8 = affine_map<(d0, d1, d2) -> (d1, d2, d0)> | |
| #map9 = affine_map<(d0, d1, d2) -> (d2)> |
This document provides architectural guidance for "Quartz" - a PyTorch HUD-like system for ROCm downstream CI/CD orchestration. The junior engineer's instinct to start with status.json is understandable but insufficient for the stated requirements. A database-first approach is correct.
Extract MoE primitives from /develop/ai-no-fluff/kb/ben/moe_f32_parameterized.mlir:
mul_mat_id - Expert-selected matrix multiplication (gather + batch_matmul)moe_ffn_block - Full MoE FFN block composing routing, expert compute, weighted sumKey challenge: moe_ffn_block depends on mul_mat_id and swiglu. Need systematic composition without manual inlining.
| Merge Commit | Individual Commits on Main |
|---|---|
| One atomic integration point | 500 commits sprawled on main |
git revert -m1 <merge> undoes everything |
Good luck reverting |
git bisect can skip the whole merge |
Bisect walks through 500 commits |
| Main history is readable | Main history is chaos |
| module @aqt_matmul { | |
| iree_input.global private @_params$0 = dense<[[0.000000e+00, 5.003000e+02, 1.000600e+03], [1500.8999, 2.001200e+03, 2.501500e+03], [3001.7998, 3502.09985, 4.002400e+03], [4502.69971, 5.003000e+03, 5.503300e+03], [6003.59961, 6503.8999, 7004.1997], [7.504500e+03, 8004.7998, 8.505100e+03]]> : tensor<6x3xf32> | |
| iree_input.global private @_params$1 = dense<5.000000e+00> : tensor<f32> | |
| func @compute_native(%arg0: tensor<5x6xf32>) -> tensor<5x3xf32> { | |
| %0 = iree_input.global.load @_params$0 : tensor<6x3xf32> | |
| %1 = iree_input.global.load @_params$1 : tensor<f32> | |
| %2 = call @main(%0, %1, %arg0) : (tensor<6x3xf32>, tensor<f32>, tensor<5x6xf32>) -> tensor<5x3xf32> | |
| return %2 : tensor<5x3xf32> | |
| } | |
| func private @main(%arg0: tensor<6x3xf32>, %arg1: tensor<f32>, %arg2: tensor<5x6xf32>) -> tensor<5x3xf32> { |