Skip to content

Instantly share code, notes, and snippets.

@aidando73
Created February 14, 2026 02:10
Show Gist options
  • Select an option

  • Save aidando73/dd0684b470dffe1aeb6f398cd3deac25 to your computer and use it in GitHub Desktop.

Select an option

Save aidando73/dd0684b470dffe1aeb6f398cd3deac25 to your computer and use it in GitHub Desktop.
Metric v6 v4 v6/v4 change
--------------------------------------------------------------------------------------------------------------
Duration 8,550 us 26,510 us 3.10x faster
Grid Size 144 48 3x
Block Size 512 512 same
Registers Per Thread 32 32 same
Local Memory Spilling 0 0 same
Theoretical Occupancy 100% 100% same
Achieved Occupancy 25.01% 25.00% same
Executed IPC Active 1.30 1.25 +4%
Issue Slots Busy 31.53% 10.12% +21.4pp
SM Busy 31.53% 10.12% +21.4pp
Compute (SM) Throughput 31.53% 10.12% +21.4pp
Memory Throughput 29.48% 9.49% +20.0pp
Executed Instructions 1,835,851,200 1,830,950,208 same (~0.3% more)
L1/TEX Hit Rate 1.36% 1.35% same
L2 Hit Rate 86.92% 86.38% same
DRAM Throughput 0.06% 0.02% same
Global Load Sectors 48,219,936 48,216,672 same
Global Store Sectors 9,360 1,536 6.1x more
DRAM Bytes Read 31.74 MB 31.74 MB same
DRAM Bytes Write 50.94 KB 65.28 KB 22% less
Active Warps Per Scheduler 4.00 4.00 same
Eligible Warps Per Scheduler 0.49 0.47 +4%
One or More Eligible 32.50% 31.23% +1.3pp
Avg. Active Threads Per Warp 30.91 30.90 same
Branch Efficiency 96.00% 96.00% same
Block Limit Registers 4 4
Block Limit Shared Mem 4 4
Block Limit Warps 4 4
Dynamic Shared Memory Per Block 36.93 KB 36.93 KB
Shared Memory Configuration Size 167.94 KB 167.94 KB
Warp Stalls (cycles/issued inst):
barrier 3.72 4.13 10% better
long_scoreboard 0.55 0.53 same
wait 1.84 1.89 3% better
short_scoreboard 3.76 3.80 1% better
not_selected 0.51 0.50 same
math_pipe_throttle 0.08 0.06 same
mio_throttle 0.30 0.33 9% better
dispatch_stall 0.02 0.04 50% better
lg_throttle 0.00 0 same
drain 0.00 0.00 same
Total (Warp Cycles Per Issued) 12.31 12.80 4% better
Pipe Utilization (% peak active):
lsu 28.37% 27.34%
adu 15.52% 15.22%
fma 13.53% 13.12%
xu 11.97% 11.53%
alu 7.72% 9.01%
cbu 1.54% 1.49%
Shared Memory Bank Conflicts:
Smem Load Bank Conflicts 0 0
Smem Store Bank Conflicts 0 0
Smem Load Wavefronts 100,320,000 100,320,000 same
Smem Store Wavefronts 56,768,640 56,768,640 same
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment