Skip to content

Instantly share code, notes, and snippets.

@de-wim
Created February 21, 2026 07:21
Show Gist options
  • Select an option

  • Save de-wim/48c34911837128a8f92db43fb36f7432 to your computer and use it in GitHub Desktop.

Select an option

Save de-wim/48c34911837128a8f92db43fb36f7432 to your computer and use it in GitHub Desktop.
llama benches
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_Qwen_Qwen3-Coder-Next-GGUF_Qwen_Qwen3-Coder-Next-Q8_0_Qwen_Qwen3-Coder-Next-Q8_0-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 358.92 ± 2.25 |
| qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 29.98 ± 0.51 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_Qwen_Qwen3-Coder-Next-GGUF_Qwen_Qwen3-Coder-Next-Q8_0_Qwen_Qwen3-Coder-Next-Q8_0-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 198.54 ± 2.17 |
| qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 23.34 ± 1.33 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_Qwen3-Coder-Next-GGUF_UD-Q6_K_XL_Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 341.27 ± 4.40 |
| qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 31.20 ± 0.99 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_Qwen3-Coder-Next-GGUF_UD-Q6_K_XL_Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 303.02 ± 9.89 |
| qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 24.74 ± 2.78 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/gpt-oss-120b-Derestricted.i1-MXFP4_MOE.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 396.91 ± 3.32 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 46.78 ± 0.87 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/gpt-oss-120b-Derestricted.i1-MXFP4_MOE.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 139.77 ± 5.94 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 39.63 ± 2.65 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-IQ3_M_stepfun-ai_Step-3.5-Flash-IQ3_M-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 160.46 ± 2.23 |
| step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 24.62 ± 0.69 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-IQ3_M_stepfun-ai_Step-3.5-Flash-IQ3_M-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 87.66 ± 1.85 |
| step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 16.69 ± 0.95 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-Q3_K_M_stepfun-ai_Step-3.5-Flash-Q3_K_M-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 143.69 ± 1.41 |
| step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 24.31 ± 0.81 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-Q3_K_M_stepfun-ai_Step-3.5-Flash-Q3_K_M-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 158.18 ± 1.89 |
| step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 18.30 ± 1.13 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/stepfun-Step-3.5-Flash-IQ3_XXS.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 170.51 ± 2.41 |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 30.34 ± 1.29 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/stepfun-Step-3.5-Flash-IQ3_XXS.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 83.67 ± 0.70 |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 18.33 ± 0.05 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_MiniMax-M2.5-GGUF_UD-IQ3_XXS_MiniMax-M2.5-UD-IQ3_XXS-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 119.44 ± 4.23 |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 25.87 ± 0.52 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_MiniMax-M2.5-GGUF_UD-IQ3_XXS_MiniMax-M2.5-UD-IQ3_XXS-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 61.32 ± 1.55 |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 16.58 ± 1.14 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_NL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 143.23 ± 2.96 |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 22.90 ± 0.64 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_NL.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 71.77 ± 2.81 |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 18.09 ± 1.03 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_XS.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 127.49 ± 2.55 |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 22.51 ± 0.55 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_XS.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 73.90 ± 3.42 |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 19.59 ± 0.01 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-Q4_K_S.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 129.40 ± 0.10 |
| minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 23.28 ± 0.85 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-Q4_K_S.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 153.32 ± 2.21 |
| minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 16.93 ± 0.91 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q4_K_XL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 165.64 ± 1.25 |
| deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 26.46 ± 1.53 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q4_K_XL.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 267.25 ± 1.56 |
| deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 35.95 ± 3.53 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q6_K_XL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 161.06 ± 1.46 |
| deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 24.12 ± 1.23 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q6_K_XL.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 233.98 ± 2.41 |
| deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 26.49 ± 2.30 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q8_K_XL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 163.58 ± 1.83 |
| deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 21.14 ± 1.39 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q8_K_XL.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 192.79 ± 3.02 |
| deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 23.66 ± 1.80 |
build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.6V-GGUF_UD-Q4_K_XL_GLM-4.6V-UD-Q4_K_XL-00001-of-00002.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 118.01 ± 3.52 |
| glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 11.11 ± 1.01 |
build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.6V-GGUF_UD-Q4_K_XL_GLM-4.6V-UD-Q4_K_XL-00001-of-00002.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 102.66 ± 2.83 |
| glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 14.88 ± 0.63 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment