bpd1069/lmstudio-model-benchmarks.md

## lmstudio-model-benchmarks.md

      
    Raw
  

              lmstudio-model-benchmarks.md
            
          
    LM Studio Model Benchmarks

Date: 2026-01-25
Platform: Framework Laptop 13 (AMD Ryzen AI Edition)
Hardware Specs


Component
Details


CPU
AMD Ryzen AI 9 HX 370 (12 cores, 24 threads)


GPU
AMD Radeon 890M (integrated, shared memory)


RAM
48 GB DDR5


Kernel
6.19.0-0.0.next.20260123.345.vanilla.fc43.x86_64 (linux-next)


OS
Fedora 43


GPU Override
HSA_OVERRIDE_GFX_VERSION=11.0.0


Models Tested


Model
Architecture
Total Params
Active Params
Size
Tool Use
Context


Qwen3 Coder 30B
MoE (qwen3moe)
30B
3B
18.63 GB
✅
262K


Granite 4 H-Tiny
Hybrid Mamba/Transformer MoE
7B
1B
4.23 GB
✅
1M


Coding Task Benchmarks

Test Descriptions


FizzBuzz - Write a Python fizzbuzz function
Fix factorial - Debug incorrect base case (return 0 → return 1)
Explain lambda - Explain a Y-combinator style factorial lambda

Results


Test
Qwen3 Seq
Granite Seq
Qwen3 Parallel
Granite Parallel


FizzBuzz
3.68s
2.97s
6.01s
6.33s


Fix factorial
5.11s
3.38s
7.97s
7.47s


Explain lambda
4.47s
2.85s
6.30s
6.09s


Total
13.26s
9.20s
20.28s
19.89s


Slowdown vs Seq
-
-
1.53x
2.16x


Tool Calling Benchmarks


Test
Qwen3 30B
Granite Tiny
Speedup


Simple (hello)
1.30s
0.72s
1.8x


Multi-tool selection
5.64s
2.11s
2.7x


Code editing
~3s
~2s
1.5x


Bash execution
6.40s
1.76s
3.6x


Multi-turn context
2.36s
1.57s
1.5x


Key Findings

Performance


Granite is 1.4-3.6x faster than Qwen3 for most tasks
Both models handle tool calling correctly
Parallel execution adds ~50-100% overhead (GPU memory contention)

Behavioral Differences


Aspect
Qwen3 Coder 30B
Granite 4 H-Tiny


Approach
Thorough, parallel tool calls
Direct, sequential


Explanations
Verbose
Concise


Tool selection
Calls multiple tools at once
One tool at a time


Caution level
Checks prerequisites first
Executes directly


Resource Usage


Both models loaded simultaneously: 22.86 GB
No OOM errors during parallel inference
Shared GPU memory works with both models active

Recommendations


Use Case
Recommended Model
Reason


Build/Edit Agent
Qwen3 Coder 30B
Better code quality, parallel tools


Plan Agent
Granite 4 H-Tiny
Fast, 1M context, tool-capable


Code Review
Qwen3 Coder 30B
More thorough analysis


Notes


Linux-next kernel provides latest AMD GPU support
HSA_OVERRIDE_GFX_VERSION=11.0.0 required for ROCm compatibility with Radeon 890M
LM Studio v0.3.x with llama.cpp backend
All tests used Q4_K_M quantization
Component	Details
CPU	AMD Ryzen AI 9 HX 370 (12 cores, 24 threads)
GPU	AMD Radeon 890M (integrated, shared memory)
RAM	48 GB DDR5
Kernel	`6.19.0-0.0.next.20260123.345.vanilla.fc43.x86_64` (linux-next)
OS	Fedora 43
GPU Override	`HSA_OVERRIDE_GFX_VERSION=11.0.0`
Model	Architecture	Total Params	Active Params	Size	Tool Use	Context
Qwen3 Coder 30B	MoE (qwen3moe)	30B	3B	18.63 GB	✅	262K
Granite 4 H-Tiny	Hybrid Mamba/Transformer MoE	7B	1B	4.23 GB	✅	1M
Test	Qwen3 Seq	Granite Seq	Qwen3 Parallel	Granite Parallel
FizzBuzz	3.68s	2.97s	6.01s	6.33s
Fix factorial	5.11s	3.38s	7.97s	7.47s
Explain lambda	4.47s	2.85s	6.30s	6.09s
Total	13.26s	9.20s	20.28s	19.89s
Slowdown vs Seq	-	-	1.53x	2.16x
Test	Qwen3 30B	Granite Tiny	Speedup
Simple (hello)	1.30s	0.72s	1.8x
Multi-tool selection	5.64s	2.11s	2.7x
Code editing	~3s	~2s	1.5x
Bash execution	6.40s	1.76s	3.6x
Multi-turn context	2.36s	1.57s	1.5x
Aspect	Qwen3 Coder 30B	Granite 4 H-Tiny
Approach	Thorough, parallel tool calls	Direct, sequential
Explanations	Verbose	Concise
Tool selection	Calls multiple tools at once	One tool at a time
Caution level	Checks prerequisites first	Executes directly
Use Case	Recommended Model	Reason
Build/Edit Agent	Qwen3 Coder 30B	Better code quality, parallel tools
Plan Agent	Granite 4 H-Tiny	Fast, 1M context, tool-capable
Code Review	Qwen3 Coder 30B	More thorough analysis