git clone https://github.com/triton-lang/triton.git && cd triton
pip install -e.
CUDA_LAUNCH_BLOCKING=1 cuda-gdb --args python test.py
In cuda-gdb
CTRL+C
(cuda-gdb) r
then when it stops, interrupt with CTRL+C and you will see
0x0000000628f96150 in repro<<<(1,1,1),(128,1,1)>>> () at /root/code/triton/repro.py:8
8 mbarrier.wait(b, 0)
(cuda-gdb) up
Initial frame selected; you cannot go up.
(cuda-gdb) bt
#0 0x0000000628f96150 in repro<<<(1,1,1),(128,1,1)>>> () at /root/code/triton/repro.py:8
Apparently there's only a single frame.
nvcc -O3 -opt-info inline -lineinfo -arch sm_90a ./test.cu -o test.out
cuda-gdb ./test.out
In cuda-gdb
(cuda-gdb) r
then when it stops, interrupt with CTRL+C and you will see
(cuda-gdb) bt
#0 repro<<<(1,1,1),(32,1,1)>>> () at /mnt/data/keren/./test.cu:9 in _Z4waitPm inlined from test.cu:24
It is much better than our line info because it shows inlined from test.cu:24 even when optimizations are turned on.