Skip to content

Instantly share code, notes, and snippets.

View GarlGuo's full-sized avatar

Wentao Guo GarlGuo

View GitHub Profile
@garrett361
garrett361 / derivation.md
Last active January 3, 2026 11:58
Sonic MoE Backwards Math

Writing out the math used in Sonic MoE: https://arxiv.org/abs/2512.14080. Sec. 3.2 and Appendix C.

Setup

Goal: compute the backwards pass without needing to cache large tensors which would blow up with increased sparsification.

Tensors:

  • X_{ted}: input tensors
  • W^1_{edn}: up projection