Created
February 21, 2026 07:21
-
-
Save de-wim/48c34911837128a8f92db43fb36f7432 to your computer and use it in GitHub Desktop.
llama benches
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_Qwen_Qwen3-Coder-Next-GGUF_Qwen_Qwen3-Coder-Next-Q8_0_Qwen_Qwen3-Coder-Next-Q8_0-00001-of-00003.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 358.92 ± 2.25 | | |
| | qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 29.98 ± 0.51 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_Qwen_Qwen3-Coder-Next-GGUF_Qwen_Qwen3-Coder-Next-Q8_0_Qwen_Qwen3-Coder-Next-Q8_0-00001-of-00003.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 198.54 ± 2.17 | | |
| | qwen3next 80B.A3B Q8_0 | 78.98 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 23.34 ± 1.33 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_Qwen3-Coder-Next-GGUF_UD-Q6_K_XL_Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 341.27 ± 4.40 | | |
| | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 31.20 ± 0.99 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_Qwen3-Coder-Next-GGUF_UD-Q6_K_XL_Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 303.02 ± 9.89 | | |
| | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 24.74 ± 2.78 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/gpt-oss-120b-Derestricted.i1-MXFP4_MOE.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 396.91 ± 3.32 | | |
| | gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 46.78 ± 0.87 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/gpt-oss-120b-Derestricted.i1-MXFP4_MOE.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 139.77 ± 5.94 | | |
| | gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 39.63 ± 2.65 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-IQ3_M_stepfun-ai_Step-3.5-Flash-IQ3_M-00001-of-00003.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 160.46 ± 2.23 | | |
| | step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 24.62 ± 0.69 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-IQ3_M_stepfun-ai_Step-3.5-Flash-IQ3_M-00001-of-00003.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 87.66 ± 1.85 | | |
| | step35 196B.A11B IQ3_S mix - 3.66 bpw | 83.97 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 16.69 ± 0.95 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-Q3_K_M_stepfun-ai_Step-3.5-Flash-Q3_K_M-00001-of-00003.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 143.69 ± 1.41 | | |
| | step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 24.31 ± 0.81 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/bartowski_stepfun-ai_Step-3.5-Flash-GGUF_stepfun-ai_Step-3.5-Flash-Q3_K_M_stepfun-ai_Step-3.5-Flash-Q3_K_M-00001-of-00003.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 158.18 ± 1.89 | | |
| | step35 196B.A11B Q3_K - Medium | 84.03 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 18.30 ± 1.13 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/stepfun-Step-3.5-Flash-IQ3_XXS.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 170.51 ± 2.41 | | |
| | step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 30.34 ± 1.29 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/stepfun-Step-3.5-Flash-IQ3_XXS.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 83.67 ± 0.70 | | |
| | step35 196B.A11B IQ3_XXS - 3.0625 bpw | 70.55 GiB | 196.96 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 18.33 ± 0.05 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_MiniMax-M2.5-GGUF_UD-IQ3_XXS_MiniMax-M2.5-UD-IQ3_XXS-00001-of-00003.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 119.44 ± 4.23 | | |
| | minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 25.87 ± 0.52 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_MiniMax-M2.5-GGUF_UD-IQ3_XXS_MiniMax-M2.5-UD-IQ3_XXS-00001-of-00003.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 61.32 ± 1.55 | | |
| | minimax-m2 230B.A10B IQ3_XXS - 3.0625 bpw | 86.90 GiB | 228.69 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 16.58 ± 1.14 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_NL.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 143.23 ± 2.96 | | |
| | minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 22.90 ± 0.64 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_NL.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 71.77 ± 2.81 | | |
| | minimax-m2 230B.A10B IQ4_NL - 4.5 bpw | 91.40 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 18.09 ± 1.03 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_XS.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 127.49 ± 2.55 | | |
| | minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 22.51 ± 0.55 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-IQ4_XS.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 73.90 ± 3.42 | | |
| | minimax-m2 230B.A10B IQ4_XS - 4.25 bpw | 86.59 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 19.59 ± 0.01 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-Q4_K_S.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 129.40 ± 0.10 | | |
| | minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 23.28 ± 0.85 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/Documents/models/Cerebras-MiniMax-M2.5-REAP-172B-Q4_K_S.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 153.32 ± 2.21 | | |
| | minimax-m2 230B.A10B Q4_K - Small | 91.38 GiB | 172.51 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 16.93 ± 0.91 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q4_K_XL.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 165.64 ± 1.25 | | |
| | deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 26.46 ± 1.53 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q4_K_XL.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 267.25 ± 1.56 | | |
| | deepseek2 30B.A3B Q4_K - Medium | 16.36 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 35.95 ± 3.53 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q6_K_XL.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 161.06 ± 1.46 | | |
| | deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 24.12 ± 1.23 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q6_K_XL.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 233.98 ± 2.41 | | |
| | deepseek2 30B.A3B Q6_K | 24.25 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 26.49 ± 2.30 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q8_K_XL.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 163.58 ± 1.83 | | |
| | deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 21.14 ± 1.39 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.7-Flash-GGUF_GLM-4.7-Flash-UD-Q8_K_XL.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 192.79 ± 3.02 | | |
| | deepseek2 30B.A3B Q8_0 | 32.70 GiB | 29.94 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 23.66 ± 1.80 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/vulkan/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.6V-GGUF_UD-Q4_K_XL_GLM-4.6V-UD-Q4_K_XL-00001-of-00002.gguf | |
| ggml_vulkan: Found 1 Vulkan devices: | |
| ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: | | |
| | glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | pp512 @ d8000 | 118.01 ± 3.52 | | |
| | glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | Vulkan,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | tg128 @ d8000 | 11.11 ± 1.01 | | |
| build: 94b0200a0 (8118) | |
| /opt/llama.cpp/rocm/bin/llama-bench -ub 2048 -b 2048 -ctk q8_0 -ctv q8_0 -ngl 999 -fa 1 -d 8000 --mmap 0 -dio 1 -m /home/wim/.cache/llama.cpp/unsloth_GLM-4.6V-GGUF_UD-Q4_K_XL_GLM-4.6V-UD-Q4_K_XL-00001-of-00002.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | |
| | model | size | params | backend | threads | n_ubatch | type_k | type_v | fa | dio | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -----: | -----: | -: | --: | --------------: | -------------------: | | |
| | glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | pp512 @ d8000 | 102.66 ± 2.83 | | |
| | glm4moe ?B Q4_K - Medium | 60.95 GiB | 106.85 B | ROCm,BLAS | 16 | 2048 | q8_0 | q8_0 | 1 | 1 | tg128 @ d8000 | 14.88 ± 0.63 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment