Skip to content

Instantly share code, notes, and snippets.

@arcanemachine
Last active February 16, 2026 07:35
Show Gist options
  • Select an option

  • Save arcanemachine/55c1d72e96df0e44b4d9b9fe1b528ecb to your computer and use it in GitHub Desktop.

Select an option

Save arcanemachine/55c1d72e96df0e44b4d9b9fe1b528ecb to your computer and use it in GitHub Desktop.
Some 4xMi50 32GB Benchmarks (Mostly Qwen Coder 30B A3B Q4_0)
Just leaving these here for future reference...
NOTE: Devices 0 and 2 are on PCIe 3x16, Devices 1 and 3 are on PCIe x1
---
user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/qwen3-0.6b-Q4_0.gguf
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | pp512 | 4695.07 ± 37.34 |
| qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | tg128 | 142.06 ± 0.30 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/qwen3-0.6b-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | pp512 | 6361.34 ± 10.00 |
| qwen3 0.6B Q4_0 | 358.78 MiB | 596.05 M | ROCm | 99 | tg128 | 267.25 ± 0.19 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/Qwen3-
Qwen3-8B-Q4_1.gguf Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf Qwen3-Coder-Next-Q4_0.gguf Qwen3-VL-30B-A3B-Instruct-Q4_0.gguf
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1126.17 ± 4.21 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 87.47 ± 0.03 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1069.72 ± 4.03 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 59.34 ± 0.30 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1112.19 ± 3.99 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 65.92 ± 0.23 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1126.02 ± 3.39 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 88.16 ± 0.05 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=1 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1126.03 ± 3.37 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 75.61 ± 0.02 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=2 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1125.93 ± 3.83 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 80.32 ± 0.13 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=3 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1108.07 ± 4.21 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 75.52 ± 0.01 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1123.98 ± 4.64 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 69.11 ± 0.16 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,1 ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1113.32 ± 4.89 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 66.69 ± 0.06 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | pp512 | 1070.14 ± 4.06 |
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | ROCm | 99 | tg128 | 58.87 ± 0.15 |
build: 8c9ef65f5 (8095)
user@aipc:~/code/ai/repo/scripts$ ./llama-bench.sh -m ../../models/Qwen3-Coder-Next-Q4_0.gguf
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | pp512 | 200.85 ± 0.45 |
| qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | tg128 | 31.48 ± 0.06 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment