Last active
February 18, 2026 19:37
-
-
Save arcanemachine/391fb5e07b44411666515057c304340c to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Some 4xMi50 32GB Benchmarks (Qwen-Coder-Next Q4_0, Q4_K_M) | |
| TLDR: Using PCIe x1 cuts PP in half for multi-card setups (for this model), but TG decrease is much less significant | |
| Notes: | |
| - Devices 0 and 2 are on PCIe 3x16, Devices 1 and 3 are on PCIe x1 | |
| - Flash attention disabled (I saw no difference with it enabled or disabled) | |
| - ROCm version: 6.3.3 | |
| - llama.cpp compiled with pwilkin autoparser branch | |
| --- | |
| ## 2 cards, both on PCIe 3x16 | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench -m ../../models/Qwen3-Coder-Next-Q4_0.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | pp512 | 547.26 ± 1.86 | | |
| | qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | tg128 | 38.06 ± 0.10 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ CUDA_VISIBLE_DEVICES=0,2 ./llama-bench -m ../../models/Qwen3-Coder-Next-Q4_K_M.gguf | |
| ggml_cuda_init: found 2 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q4_K - Medium | 45.17 GiB | 79.67 B | ROCm | 99 | pp512 | 491.68 ± 1.89 | | |
| | qwen3next 80B.A3B Q4_K - Medium | 45.17 GiB | 79.67 B | ROCm | 99 | tg128 | 36.24 ± 0.16 | | |
| ## 4 cards, 2 on PCIe 3x16, 2 on PCIe x1 | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench -m ../../models/Qwen3-Coder-Next-Q4_0.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | pp512 | 200.63 ± 0.34 | | |
| | qwen3next 80B.A3B Q4_0 | 42.19 GiB | 79.67 B | ROCm | 99 | tg128 | 31.56 ± 0.10 | | |
| build: 8c9ef65f5 (8095) | |
| user@aipc:~/code/ai/repo/scripts$ ./llama-bench -m ../../models/Qwen3-Coder-Next-Q4_K_M.gguf | |
| ggml_cuda_init: found 4 ROCm devices: | |
| Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64 | |
| | model | size | params | backend | ngl | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | |
| | qwen3next 80B.A3B Q4_K - Medium | 45.17 GiB | 79.67 B | ROCm | 99 | pp512 | 192.41 ± 0.27 | | |
| | qwen3next 80B.A3B Q4_K - Medium | 45.17 GiB | 79.67 B | ROCm | 99 | tg128 | 29.96 ± 0.32 | | |
| build: 8c9ef65f5 (8095) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment