Skip to content

Instantly share code, notes, and snippets.

@arcanemachine
Last active February 17, 2026 04:41
Show Gist options
  • Select an option

  • Save arcanemachine/bb4a87f1395e5148a242c042cf973d5b to your computer and use it in GitHub Desktop.

Select an option

Save arcanemachine/bb4a87f1395e5148a242c042cf973d5b to your computer and use it in GitHub Desktop.
mi50-benchmarks-qwen3-4b
model gpus pp512 tg128
qwen3 4B Q4_0 0-3 1365.67 71.05
qwen3 4B Q4_0 0 1508.90 113.37
qwen3 4B Q4_0 1 1476.14 98.64
qwen3 4B Q4_0 0,2 1491.52 90.91
qwen3 4B Q4_0 0,1 1457.96 83.26
qwen3 4B IQ4_XS 0-3 1124.38 69.33
qwen3 4B IQ4_XS 0 1239.12 104.88
qwen3 4B IQ4_XS 1 1201.76 92.41
qwen3 4B IQ4_XS 0,2 1219.07 83.84
qwen3 4B IQ4_XS 0,1 1192.91 80.27
qwen3 4B Q4_K 0-3 1173.24 60.68
qwen3 4B Q4_K 0 1284.66 87.25
qwen3 4B Q4_K 1 1254.96 78.97
qwen3 4B Q4_K 0,2 1270.45 70.11
qwen3 4B Q4_K 0,1 1245.03 66.69
Some 4xMi50 32GB Benchmarks (Qwen 4B Q4_0, IQ_4_XS, Q4_K_XL)
Just leaving these here for future reference...
NOTE: Devices 0 and 2 are on PCIe 3x16, Devices 1 and 3 are on PCIe x1
---
Here is the CSV data (more human-friendly available if you scroll down):
model,gpus,pp512,tg128
qwen3 4B Q4_0,"0-3",1365.67,71.05
qwen3 4B Q4_0,0,1508.90,113.37
qwen3 4B Q4_0,1,1476.14,98.64
qwen3 4B Q4_0,"0,2",1491.52,90.91
qwen3 4B Q4_0,"0,1",1457.96,83.26
qwen3 4B IQ4_XS,"0-3",1124.38,69.33
qwen3 4B IQ4_XS,0,1239.12,104.88
qwen3 4B IQ4_XS,1,1201.76,92.41
qwen3 4B IQ4_XS,"0,2",1219.07,83.84
qwen3 4B IQ4_XS,"0,1",1192.91,80.27
qwen3 4B Q4_K,"0-3",1173.24,60.68
qwen3 4B Q4_K,0,1284.66,87.25
qwen3 4B Q4_K,1,1254.96,78.97
qwen3 4B Q4_K,"0,2",1270.45,70.11
qwen3 4B Q4_K,"0,1",1245.03,66.69
---
## Q4_0
user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1365.67 ± 0.99 |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 71.05 ± 0.49 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1508.90 ± 0.78 |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 113.37 ± 0.16 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1476.14 ± 0.29 |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 98.64 ± 0.01 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1491.52 ± 0.84 |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 90.91 ± 0.02 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/qwen3-4b-Q4_0.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 1457.96 ± 1.20 |
| qwen3 4B Q4_0 | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 83.26 ± 0.06 |
build: 8c9ef65f5 (8095)
## IQ4_XS
user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1124.38 ± 0.79 |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 69.33 ± 1.10 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1239.12 ± 0.58 |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 104.88 ± 0.07 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1201.76 ± 0.36 |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 92.41 ± 0.02 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1219.07 ± 0.49 |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 83.84 ± 0.37 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/qwen3-4b-iq4_xs.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | pp512 | 1192.91 ± 0.69 |
| qwen3 4B IQ4_XS - 4.25 bpw | 2.11 GiB | 4.02 B | ROCm | 99 | tg128 | 80.27 ± 0.18 |
build: 8c9ef65f5 (8095)
## Q4_K_XL
user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1173.24 ± 0.83 |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 60.68 ± 0.12 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1284.66 ± 0.47 |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 87.25 ± 0.08 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1254.96 ± 0.55 |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 78.97 ± 0.03 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1270.45 ± 0.25 |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 70.11 ± 0.42 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/qwen3-4b-q4_k_xl.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | pp512 | 1245.03 ± 0.95 |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | ROCm | 99 | tg128 | 66.69 ± 0.05 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment