Last active
February 17, 2026 04:41
-
-
Save arcanemachine/bb4a87f1395e5148a242c042cf973d5b to your computer and use it in GitHub Desktop.
mi50-benchmarks-qwen3-4b
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| model | gpus | pp512 | tg128 | |
|---|---|---|---|---|
| qwen3 4B Q4_0 | 0-3 | 1365.67 | 71.05 | |
| qwen3 4B Q4_0 | 0 | 1508.90 | 113.37 | |
| qwen3 4B Q4_0 | 1 | 1476.14 | 98.64 | |
| qwen3 4B Q4_0 | 0,2 | 1491.52 | 90.91 | |
| qwen3 4B Q4_0 | 0,1 | 1457.96 | 83.26 | |
| qwen3 4B IQ4_XS | 0-3 | 1124.38 | 69.33 | |
| qwen3 4B IQ4_XS | 0 | 1239.12 | 104.88 | |
| qwen3 4B IQ4_XS | 1 | 1201.76 | 92.41 | |
| qwen3 4B IQ4_XS | 0,2 | 1219.07 | 83.84 | |
| qwen3 4B IQ4_XS | 0,1 | 1192.91 | 80.27 | |
| qwen3 4B Q4_K | 0-3 | 1173.24 | 60.68 | |
| qwen3 4B Q4_K | 0 | 1284.66 | 87.25 | |
| qwen3 4B Q4_K | 1 | 1254.96 | 78.97 | |
| qwen3 4B Q4_K | 0,2 | 1270.45 | 70.11 | |
| qwen3 4B Q4_K | 0,1 | 1245.03 | 66.69 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Some 4xMi50 32GB Benchmarks (Qwen 4B Q4_0, IQ_4_XS, Q4_K_XL) | |
| Just leaving these here for future reference... | |
| NOTE: Devices 0 and 2 are on PCIe 3x16, Devices 1 and 3 are on PCIe x1 | |
| --- | |
| Here is the CSV data (more human-friendly available if you scroll down): | |
| model,gpus,pp512,tg128 | |
| qwen3 4B Q4_0,"0-3",1365.67,71.05 | |
| qwen3 4B Q4_0,0,1508.90,113.37 | |
| qwen3 4B Q4_0,1,1476.14,98.64 | |
| qwen3 4B Q4_0,"0,2",1491.52,90.91 | |
| qwen3 4B Q4_0,"0,1",1457.96,83.26 | |
| qwen3 4B IQ4_XS,"0-3",1124.38,69.33 | |
| qwen3 4B IQ4_XS,0,1239.12,104.88 | |
| qwen3 4B IQ4_XS,1,1201.76,92.41 | |
| qwen3 4B IQ4_XS,"0,2",1219.07,83.84 | |
| qwen3 4B IQ4_XS,"0,1",1192.91,80.27 | |
| qwen3 4B Q4_K,"0-3",1173.24,60.68 | |
| qwen3 4B Q4_K,0,1284.66,87.25 | |
| qwen3 4B Q4_K,1,1254.96,78.97 | |
| qwen3 4B Q4_K,"0,2",1270.45,70.11 | |
| qwen3 4B Q4_K,"0,1",1245.03,66.69 | |
| --- | |
| ## Q4_0 | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1365.67 ± 0.99 | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 71.05 ± 0.49 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1508.90 ± 0.78 | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 113.37 ± 0.16 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1476.14 ± 0.29 | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 98.64 ± 0.01 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1491.52 ± 0.84 | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 90.91 ± 0.02 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1457.96 ± 1.20 | | |
| | qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 83.26 ± 0.06 | | |
| build: 8c9ef65f5 (8095) | |
| ## IQ4_XS | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1124.38 ± 0.79 | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 69.33 ± 1.10 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1239.12 ± 0.58 | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 104.88 ± 0.07 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1201.76 ± 0.36 | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 92.41 ± 0.02 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1219.07 ± 0.49 | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 83.84 ± 0.37 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1192.91 ± 0.69 | | |
| | qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 80.27 ± 0.18 | | |
| build: 8c9ef65f5 (8095) | |
| ## Q4_K_XL | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1173.24 ± 0.83 | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 60.68 ± 0.12 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1284.66 ± 0.47 | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 87.25 ± 0.08 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf | |
| ggml_cuda_init: found 1 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1254.96 ± 0.55 | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 78.97 ± 0.03 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1270.45 ± 0.25 | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 70.11 ± 0.42 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1245.03 ± 0.95 | | |
| | qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 66.69 ± 0.05 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment