Created
February 14, 2026 02:10
-
-
Save aidando73/dd0684b470dffe1aeb6f398cd3deac25 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Metric v6 v4 v6/v4 change | |
| -------------------------------------------------------------------------------------------------------------- | |
| Duration 8,550 us 26,510 us 3.10x faster | |
| Grid Size 144 48 3x | |
| Block Size 512 512 same | |
| Registers Per Thread 32 32 same | |
| Local Memory Spilling 0 0 same | |
| Theoretical Occupancy 100% 100% same | |
| Achieved Occupancy 25.01% 25.00% same | |
| Executed IPC Active 1.30 1.25 +4% | |
| Issue Slots Busy 31.53% 10.12% +21.4pp | |
| SM Busy 31.53% 10.12% +21.4pp | |
| Compute (SM) Throughput 31.53% 10.12% +21.4pp | |
| Memory Throughput 29.48% 9.49% +20.0pp | |
| Executed Instructions 1,835,851,200 1,830,950,208 same (~0.3% more) | |
| L1/TEX Hit Rate 1.36% 1.35% same | |
| L2 Hit Rate 86.92% 86.38% same | |
| DRAM Throughput 0.06% 0.02% same | |
| Global Load Sectors 48,219,936 48,216,672 same | |
| Global Store Sectors 9,360 1,536 6.1x more | |
| DRAM Bytes Read 31.74 MB 31.74 MB same | |
| DRAM Bytes Write 50.94 KB 65.28 KB 22% less | |
| Active Warps Per Scheduler 4.00 4.00 same | |
| Eligible Warps Per Scheduler 0.49 0.47 +4% | |
| One or More Eligible 32.50% 31.23% +1.3pp | |
| Avg. Active Threads Per Warp 30.91 30.90 same | |
| Branch Efficiency 96.00% 96.00% same | |
| Block Limit Registers 4 4 | |
| Block Limit Shared Mem 4 4 | |
| Block Limit Warps 4 4 | |
| Dynamic Shared Memory Per Block 36.93 KB 36.93 KB | |
| Shared Memory Configuration Size 167.94 KB 167.94 KB | |
| Warp Stalls (cycles/issued inst): | |
| barrier 3.72 4.13 10% better | |
| long_scoreboard 0.55 0.53 same | |
| wait 1.84 1.89 3% better | |
| short_scoreboard 3.76 3.80 1% better | |
| not_selected 0.51 0.50 same | |
| math_pipe_throttle 0.08 0.06 same | |
| mio_throttle 0.30 0.33 9% better | |
| dispatch_stall 0.02 0.04 50% better | |
| lg_throttle 0.00 0 same | |
| drain 0.00 0.00 same | |
| Total (Warp Cycles Per Issued) 12.31 12.80 4% better | |
| Pipe Utilization (% peak active): | |
| lsu 28.37% 27.34% | |
| adu 15.52% 15.22% | |
| fma 13.53% 13.12% | |
| xu 11.97% 11.53% | |
| alu 7.72% 9.01% | |
| cbu 1.54% 1.49% | |
| Shared Memory Bank Conflicts: | |
| Smem Load Bank Conflicts 0 0 | |
| Smem Store Bank Conflicts 0 0 | |
| Smem Load Wavefronts 100,320,000 100,320,000 same | |
| Smem Store Wavefronts 56,768,640 56,768,640 same |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment