Last active
February 16, 2026 07:35
-
-
Save arcanemachine/55c1d72e96df0e44b4d9b9fe1b528ecb to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Some 4xMi50 32GB Benchmarks (Mostly Qwen Coder 30B A3B Q4_0) | |
| Just leaving these here for future reference... | |
| NOTE: Devices 0 and 2 are on PCIe 3x16, Devices 1 and 3 are on PCIe x1 | |
| --- | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-0.6b-Q4_0.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | pp512 | 4695.07 ± 37.34 | | |
| | qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | tg128 | 142.06 ± 0.30 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-0.6b-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | pp512 | 6361.34 ± 10.00 | | |
| | qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | tg128 | 267.25 ± 0.19 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/Qwen3- | |
| Qwen3-8B-Q4_1.gguf Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf Qwen3-Coder-Next-Q4_0.gguf Qwen3-VL-30B-A3B-Instruct-Q4_0.gguf | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1126.17 ± 4.21 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 87.47 ± 0.03 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1069.72 ± 4.03 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 59.34 ± 0.30 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1112.19 ± 3.99 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 65.92 ± 0.23 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1126.02 ± 3.39 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 88.16 ± 0.05 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1126.03 ± 3.37 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 75.61 ± 0.02 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=2 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1125.93 ± 3.83 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 80.32 ± 0.13 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=3 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1108.07 ± 4.21 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 75.52 ± 0.01 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1123.98 ± 4.64 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 69.11 ± 0.16 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1113.32 ± 4.89 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 66.69 ± 0.06 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1070.14 ± 4.06 | | |
| | qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 58.87 ± 0.15 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/Qwen3-Coder-Next-Q4_0.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | pp512 | 200.85 ± 0.45 | | |
| | qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | tg128 | 31.48 ± 0.06 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment