Skip to content

Instantly share code, notes, and snippets.

@b8zhong
b8zhong / flashinfer_mm_fp4_benchmark.py
Created January 19, 2026 21:55
Flashinfer `mm_fp4` backend benchmark (CuDNN vs CUTLASS implementation)
import argparse
import csv
import os
import torch
import triton
from flashinfer import mm_fp4
from flashinfer.testing import bench_gpu_time_with_cupti
from sgl_kernel import cutlass_scaled_fp4_mm, scaled_fp4_quant