Skip to content

Instantly share code, notes, and snippets.

@davidberard98
Created September 6, 2025 05:45
Show Gist options
  • Select an option

  • Save davidberard98/34e1943dd9c057b1a9e186278ae8fc55 to your computer and use it in GitHub Desktop.

Select an option

Save davidberard98/34e1943dd9c057b1a9e186278ae8fc55 to your computer and use it in GitHub Desktop.
# only skipped commits left to test
# possible first bad commit: [f0975f9d02f6d8b69146aea84b8ee7c6e81c4ed5] [AMD][Build] Fix build issue with AMD lld (#7608)
# possible first bad commit: [620c59165a1452ddd5dd685054b84485a35dc92e] [AMD][NFC] Group scheduling functions in StreamPipeliner (#7607)
# possible first bad commit: [16b25e1620e87f02f7d16ff5b9da2d425c6e99a6] [BACKEND] combineRedundantWaitOps should not combine across loops/branches (#7593)
# possible first bad commit: [fdd694d48a2b05c49bd9658fbedc3204d04654e1] [Triton][Gluon] Add `map_elementwise` (#7564)
# possible first bad commit: [03cdcdb38a4812f8c371ff7de433e6c7b605b1ef] [KERNELS] Fix `bench_mlp.py` for AMD (#7600)
# possible first bad commit: [ef72c317570de66807d83648bde1bff64e2898f8] [Tests] Improve regex for test_compile_only_dot (#7602)
# possible first bad commit: [a7a89c7c9262ed3d761ea771a907d53f4caba92a] [FRONTEND] Refactor unsplat to use new op (#7586)
# possible first bad commit: [6415039bf96145fdabca0ca0ac5f05b9d42cf45f] [AMD] Support 4x64 and 64x4 MFMA layout for dot (#7576)
# possible first bad commit: [8a5862dceae0d4a12b38158b4abd5fcbb7209fa2] [AMD] Enable lowerLocalLdSt for AMD path (#7355)
# possible first bad commit: [75eed88ab167eaf1d4df7897cae0ce559f2e6951] [KERNELS] fixup leftover experiment code (#7599)
# possible first bad commit: [d0642aab963a6e435b11eaa6dd184dc9208fb1df] Revert "[BACKEND] Hoist tmem alloc outside of if (#7568)" (#7597)
# possible first bad commit: [e6aa86c9e9e232aae80548310dbf7a6a112be400] [BACKEND] Fix fp16 to fp32 conversion (#7585)
# possible first bad commit: [ca0fe1b41dc4adfcfcefa23858e9a040d1a4a516] [KERNELS] Skip test_upcast_mxfp4_to_bf16 on AMD (#7595)
# possible first bad commit: [de846c0b0858d06e1fc7783b1ea13ecb6e2611fa] [KERNELS] added missing mxfp4 tests (#7591)
# possible first bad commit: [fde96e8f1703ab9a6410c5c6b0ff3a6be64b3a55] Remove redundant calls to inspect.getsource (#7588)
# possible first bad commit: [981b0bb0c66b28d76a71141b837ecff594207753] [KERNELS] improved twiddling/swizzling for H100 simulated mxfp4 (h/t @lezcano) (#7587)
# possible first bad commit: [63f34320e389d95128ab9cff5f356d2096dfba7f] [WS] Update RewritePartitionDepdencies to insert arefs (#7561)
# possible first bad commit: [6a6ed52e1781d38fad487da39443604273761bb9] [BACKEND] Generalize `getShapePerCTA` (#7580)
# possible first bad commit: [167ed2866c56c0b5e168d7f02dddec51c8ff4c5b] Cache NvidiaTool.from_path (#7569)
# possible first bad commit: [d5de496dcd0fbefa0b785efd708d43b577fbf3d5] [KERNELS] Remove incorrect check (#7583)
# possible first bad commit: [7708b9c12a9848a0be33e1c21c7c7bdd23b207d4] [mxfp/easy] handle an empty tensor in downcast_to_mxfp (#7579)
# possible first bad commit: [9660a0d479774e40f451c682b6bb9cf32330dfcc] [CI] Reduce MacOS CI build time (#7559)
# possible first bad commit: [13b3a75de6f418826254e3341cd2095a3347ee59] [KERNELS] Fix swizzling contiguity (#7582)
# possible first bad commit: [984b694dc2916ee4f8cd18d3a28d1d8da14e076d] [mxfp/easy] add MXFP_BLOCK_SIZE constant (#7567)
# possible first bad commit: [67f647af21b6ec85bbe561a6591ee5c28cffe509] [IR] Add type checks for `atomic_cas` (#7578)
# possible first bad commit: [948ba8f4fd171588b584f51109f04692b77d5c27] [DOC] Update install instructions in docs (#7572)
# possible first bad commit: [db7170e18a3529bdaf44dcdeecb56eb36d375c54] [Backend] Follow-up refactor of `getWarpLayoutConvertDecomposition` (#7571)
# possible first bad commit: [1e0a371139142b04c60b553a1324907c00ea3fbc] [AMD] Restrict merging async_wait in StreamPipeliner (#7577)
# possible first bad commit: [4048f318a4bb4c7da693e56cd8c9a4d5a39e4e39] [BACKEND] Add LocalLoadTrait to group local load-like ops (#7511)
# possible first bad commit: [3854ae83b65a4234c0c4b6b39054b5ea3e3a4b73] [Backend] Improve warp-local layout conversion algo using shuffles (#7558)
# possible first bad commit: [570f24d016702cbfe1179beae3eb03d24e9c6b40] [AMD] Improve register usage in Float8 conversions (#7527)
# possible first bad commit: [96e91d49f1fa0760594ea65699920dcfa04ed206] [BACKEND] Hoist tmem alloc outside of if (#7568)
# possible first bad commit: [7affb3b232166fcac3af9f8295df7bede6b6b35d] [BACKEND] Ignore bad bytes in ldconfig -p (#7566)
# possible first bad commit: [dba750e4d6786f077ec6f064153482ebaab22019] [AMD] Support async load in ping-pong pass (#7458)
# possible first bad commit: [ea82c9f485fa5283fee4abc0e62df69f2380ce15] [Gluon] Clarify o_init meaning in attention tutorial (#7563)
# possible first bad commit: [3bd3e320d901aa7de663795e7723095334b3c463] [AMD] Use lld library API for linking (#7548)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment