Skip to content

Instantly share code, notes, and snippets.

View aidando73's full-sized avatar

Aidan Do aidando73

  • fireworks.ai
  • San Francisco Bay Area
View GitHub Profile
@aidando73
aidando73 / swizzle_mma_repro.cu
Created February 18, 2026 03:27
Minimal repro: fp8 SS MMA with NoSwizzle vs SW32 vs SW64 vs SW128 on SM100 (B200). All four swizzle modes produce identical MMA throughput.
// Minimal repro: fp8 SS MMA with NoSwizzle vs SW32 vs SW64 vs SW128.
// All four produce identical MMA throughput (129 cycles/iter).
// Only depends on CUTLASS (no FlashMLA, no mla_decode_v21).
//
// Build & run (requires sm_100a GPU + CUTLASS source):
// nvcc -std=c++20 -O2 --generate-code=arch=compute_100a,code=[sm_100a] \
// -I third-party/cutlass/include \
// -o scripts/swizzle_mma_repro scripts/swizzle_mma_repro.cu \
// && ./scripts/swizzle_mma_repro
Metric v6 v4 v6/v4 change
--------------------------------------------------------------------------------------------------------------
Duration 8,550 us 26,510 us 3.10x faster
Grid Size 144 48 3x
Block Size 512 512 same
Registers Per Thread 32 32 same
Local Memory Spilling 0 0 same
Theoretical Occupancy 100% 100% same
Achieved Occupancy 25.01% 25.00% same
Executed IPC Active 1.30 1.25 +4%
@aidando73
aidando73 / README.md
Last active December 26, 2024 02:17
SWE-Bench-Lite CodeAct 2.1 - Llama 405B/3.3 70B

Llama 3.3 70B Instruct

  • Score: 0.047
  • model: "openrouter/meta-llama/llama-3.3-70b-instruct"
@aidando73
aidando73 / !lambdalabs-setup.md
Last active February 22, 2025 01:03
Setup for lambdalabs.com boxes

After booting a new instance:

curl -H 'Cache-Control: no-cache' https://gist.githubusercontent.com/aidando73/2876aabaae7ded0dec68d04d34b70086/raw/new_instance.bash | bash

If you want to use conda:

source ~/miniconda3/bin/activate
@aidando73
aidando73 / README.md
Last active November 21, 2024 01:07
Updating train_gpt2.cu for inference (llm.c)
@aidando73
aidando73 / README.md
Last active November 20, 2024 02:41
llm.c vislog.py
@aidando73
aidando73 / gist:fc7a6069fef9b718576ff9adbc8fe6b7
Last active November 21, 2024 22:05
Training GPT-2 124M

Training GPT-2 124M

I tried the GPT-2 124M parameter model on 3 different GPU setups - following karpathy/llm.c#481. Here's how long each run it took and how much it cost:

1x A10G 1x A100 40GB 8x A100 40GB
training time 48h 15h 2h
cost $163 $27 $45

Most of the completions are fairly nonsensical, but here are some interesting ones:

@aidando73
aidando73 / strings.txt
Last active September 26, 2023 05:18
Naughty Strings
Trailing
~!@#$%^&*()_+{}|:"<>?[]\',./`
👾 🙇 💁 🙅 🙆 🙋 🙎 🙍
Ṱ̺̺̕o͞ ̷i̲̬͇̪͙n̝̗͕v̟̜̘̦͟o̶̙
<sc<script>ript>alert(13)</sc</script>ript>