Next Generation GPU Enablement (Blackwell, MI350)
High-performance Custom GPU Kernels
Advanced Algortihms for Large Language Models
Compiler & Ecosystem Advancements (Triton, GEMM Tuning)
Our work was recognized over $100 million infra annual saving, top-tier publications, & open-source contributions.
- Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions, Arxiv, Aug 11, 2025