de-wim/gist:48c34911837128a8f92db43fb36f7432

## gistfile1.txt
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_Qwen_Qwen3-Coder-Next-GGUF_Qwen_Qwen3-Coder-Next-Q8_0_Qwen_Qwen3-Coder-Next-Q8_0-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3next 80B.A3B Q8_0         |  78.98 GiB |    79.67 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        358.92 ± 2.25 |
| qwen3next 80B.A3B Q8_0         |  78.98 GiB |    79.67 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         29.98 ± 0.51 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_Qwen_Qwen3-Coder-Next-GGUF_Qwen_Qwen3-Coder-Next-Q8_0_Qwen_Qwen3-Coder-Next-Q8_0-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| qwen3next 80B.A3B Q8_0         |  78.98 GiB |    79.67 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        198.54 ± 2.17 |
| qwen3next 80B.A3B Q8_0         |  78.98 GiB |    79.67 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         23.34 ± 1.33 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_Qwen3-Coder-Next-GGUF_UD-Q6_K_XL_Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3next 80B.A3B Q6_K         |  63.87 GiB |    79.67 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        341.27 ± 4.40 |
| qwen3next 80B.A3B Q6_K         |  63.87 GiB |    79.67 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         31.20 ± 0.99 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_Qwen3-Coder-Next-GGUF_UD-Q6_K_XL_Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| qwen3next 80B.A3B Q6_K         |  63.87 GiB |    79.67 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        303.02 ± 9.89 |
| qwen3next 80B.A3B Q6_K         |  63.87 GiB |    79.67 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         24.74 ± 2.78 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/gpt-oss-120b-Derestricted.i1-MXFP4_MOE.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        396.91 ± 3.32 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         46.78 ± 0.87 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/gpt-oss-120b-Derestricted.i1-MXFP4_MOE.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        139.77 ± 5.94 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         39.63 ± 2.65 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-IQ3_M_stepfun-ai_Step-3.5-Flash-IQ3_M-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| step35 196B.A11B IQ3_S mix - 3.66 bpw |  83.97 GiB |   196.96 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        160.46 ± 2.23 |
| step35 196B.A11B IQ3_S mix - 3.66 bpw |  83.97 GiB |   196.96 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         24.62 ± 0.69 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-IQ3_M_stepfun-ai_Step-3.5-Flash-IQ3_M-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| step35 196B.A11B IQ3_S mix - 3.66 bpw |  83.97 GiB |   196.96 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |         87.66 ± 1.85 |
| step35 196B.A11B IQ3_S mix - 3.66 bpw |  83.97 GiB |   196.96 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         16.69 ± 0.95 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-Q3_K_M_stepfun-ai_Step-3.5-Flash-Q3_K_M-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| step35 196B.A11B Q3_K - Medium |  84.03 GiB |   196.96 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        143.69 ± 1.41 |
| step35 196B.A11B Q3_K - Medium |  84.03 GiB |   196.96 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         24.31 ± 0.81 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-Q3_K_M_stepfun-ai_Step-3.5-Flash-Q3_K_M-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| step35 196B.A11B Q3_K - Medium |  84.03 GiB |   196.96 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        158.18 ± 1.89 |
| step35 196B.A11B Q3_K - Medium |  84.03 GiB |   196.96 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         18.30 ± 1.13 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/stepfun-Step-3.5-Flash-IQ3_XXS.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw |  70.55 GiB |   196.96 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        170.51 ± 2.41 |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw |  70.55 GiB |   196.96 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         30.34 ± 1.29 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/stepfun-Step-3.5-Flash-IQ3_XXS.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw |  70.55 GiB |   196.96 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |         83.67 ± 0.70 |
| step35 196B.A11B IQ3_XXS - 3.0625 bpw |  70.55 GiB |   196.96 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         18.33 ± 0.05 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_MiniMax-M2.5-GGUF_UD-IQ3_XXS_MiniMax-M2.5-UD-IQ3_XXS-00001-of-00003.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw |  86.90 GiB |   228.69 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        119.44 ± 4.23 |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw |  86.90 GiB |   228.69 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         25.87 ± 0.52 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_MiniMax-M2.5-GGUF_UD-IQ3_XXS_MiniMax-M2.5-UD-IQ3_XXS-00001-of-00003.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw |  86.90 GiB |   228.69 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |         61.32 ± 1.55 |
| minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw |  86.90 GiB |   228.69 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         16.58 ± 1.14 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_NL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw |  91.40 GiB |   172.51 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        143.23 ± 2.96 |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw |  91.40 GiB |   172.51 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         22.90 ± 0.64 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_NL.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw |  91.40 GiB |   172.51 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |         71.77 ± 2.81 |
| minimax-m2 230B.A10B IQ4_NL - 4.5 bpw |  91.40 GiB |   172.51 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         18.09 ± 1.03 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_XS.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw |  86.59 GiB |   172.51 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        127.49 ± 2.55 |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw |  86.59 GiB |   172.51 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         22.51 ± 0.55 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_XS.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw |  86.59 GiB |   172.51 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |         73.90 ± 3.42 |
| minimax-m2 230B.A10B IQ4_XS - 4.25 bpw |  86.59 GiB |   172.51 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         19.59 ± 0.01 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-Q4_K_S.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| minimax-m2 230B.A10B Q4_K - Small |  91.38 GiB |   172.51 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        129.40 ± 0.10 |
| minimax-m2 230B.A10B Q4_K - Small |  91.38 GiB |   172.51 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         23.28 ± 0.85 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-Q4_K_S.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B Q4_K - Small |  91.38 GiB |   172.51 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        153.32 ± 2.21 |
| minimax-m2 230B.A10B Q4_K - Small |  91.38 GiB |   172.51 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         16.93 ± 0.91 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q4_K_XL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium |  16.36 GiB |    29.94 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        165.64 ± 1.25 |
| deepseek2 30B.A3B Q4_K - Medium |  16.36 GiB |    29.94 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         26.46 ± 1.53 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q4_K_XL.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium |  16.36 GiB |    29.94 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        267.25 ± 1.56 |
| deepseek2 30B.A3B Q4_K - Medium |  16.36 GiB |    29.94 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         35.95 ± 3.53 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q6_K_XL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q6_K         |  24.25 GiB |    29.94 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        161.06 ± 1.46 |
| deepseek2 30B.A3B Q6_K         |  24.25 GiB |    29.94 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         24.12 ± 1.23 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q6_K_XL.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| deepseek2 30B.A3B Q6_K         |  24.25 GiB |    29.94 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        233.98 ± 2.41 |
| deepseek2 30B.A3B Q6_K         |  24.25 GiB |    29.94 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         26.49 ± 2.30 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q8_K_XL.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q8_0         |  32.70 GiB |    29.94 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        163.58 ± 1.83 |
| deepseek2 30B.A3B Q8_0         |  32.70 GiB |    29.94 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         21.14 ± 1.39 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q8_K_XL.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| deepseek2 30B.A3B Q8_0         |  32.70 GiB |    29.94 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        192.79 ± 3.02 |
| deepseek2 30B.A3B Q8_0         |  32.70 GiB |    29.94 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         23.66 ± 1.80 |

build: 94b0200a0 (8118)
/opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.6V-GGUF_UD-Q4_K_XL_GLM-4.6V-UD-Q4_K_XL-00001-of-00002.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| glm4moe ?B Q4_K - Medium       |  60.95 GiB |   106.85 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   pp512 @ d8000 |        118.01 ± 3.52 |
| glm4moe ?B Q4_K - Medium       |  60.95 GiB |   106.85 B | Vulkan,BLAS |      16 |     2048 |   q8_0 |   q8_0 |  1 |   tg128 @ d8000 |         11.11 ± 1.01 |

build: 94b0200a0 (8118)
/opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.6V-GGUF_UD-Q4_K_XL_GLM-4.6V-UD-Q4_K_XL-00001-of-00002.gguf
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_ubatch | type_k | type_v | fa | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: |
| glm4moe ?B Q4_K - Medium       |  60.95 GiB |   106.85 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   pp512 @ d8000 |        102.66 ± 2.83 |
| glm4moe ?B Q4_K - Medium       |  60.95 GiB |   106.85 B | ROCm,BLAS  |      16 |     2048 |   q8_0 |   q8_0 |  1 |   1 |   tg128 @ d8000 |         14.88 ± 0.63 |
No results found