Skip to content

Instantly share code, notes, and snippets.

@bpd1069
Created January 25, 2026 23:53
Show Gist options
  • Select an option

  • Save bpd1069/f6c102f7ccced33fa3c10a9120aa8b56 to your computer and use it in GitHub Desktop.

Select an option

Save bpd1069/f6c102f7ccced33fa3c10a9120aa8b56 to your computer and use it in GitHub Desktop.

LM Studio Model Benchmarks

Date: 2026-01-25 Platform: Framework Laptop 13 (AMD Ryzen AI Edition)

Hardware Specs

Component Details
CPU AMD Ryzen AI 9 HX 370 (12 cores, 24 threads)
GPU AMD Radeon 890M (integrated, shared memory)
RAM 48 GB DDR5
Kernel 6.19.0-0.0.next.20260123.345.vanilla.fc43.x86_64 (linux-next)
OS Fedora 43
GPU Override HSA_OVERRIDE_GFX_VERSION=11.0.0

Models Tested

Model Architecture Total Params Active Params Size Tool Use Context
Qwen3 Coder 30B MoE (qwen3moe) 30B 3B 18.63 GB 262K
Granite 4 H-Tiny Hybrid Mamba/Transformer MoE 7B 1B 4.23 GB 1M

Coding Task Benchmarks

Test Descriptions

  1. FizzBuzz - Write a Python fizzbuzz function
  2. Fix factorial - Debug incorrect base case (return 0 → return 1)
  3. Explain lambda - Explain a Y-combinator style factorial lambda

Results

Test Qwen3 Seq Granite Seq Qwen3 Parallel Granite Parallel
FizzBuzz 3.68s 2.97s 6.01s 6.33s
Fix factorial 5.11s 3.38s 7.97s 7.47s
Explain lambda 4.47s 2.85s 6.30s 6.09s
Total 13.26s 9.20s 20.28s 19.89s
Slowdown vs Seq - - 1.53x 2.16x

Tool Calling Benchmarks

Test Qwen3 30B Granite Tiny Speedup
Simple (hello) 1.30s 0.72s 1.8x
Multi-tool selection 5.64s 2.11s 2.7x
Code editing ~3s ~2s 1.5x
Bash execution 6.40s 1.76s 3.6x
Multi-turn context 2.36s 1.57s 1.5x

Key Findings

Performance

  • Granite is 1.4-3.6x faster than Qwen3 for most tasks
  • Both models handle tool calling correctly
  • Parallel execution adds ~50-100% overhead (GPU memory contention)

Behavioral Differences

Aspect Qwen3 Coder 30B Granite 4 H-Tiny
Approach Thorough, parallel tool calls Direct, sequential
Explanations Verbose Concise
Tool selection Calls multiple tools at once One tool at a time
Caution level Checks prerequisites first Executes directly

Resource Usage

  • Both models loaded simultaneously: 22.86 GB
  • No OOM errors during parallel inference
  • Shared GPU memory works with both models active

Recommendations

Use Case Recommended Model Reason
Build/Edit Agent Qwen3 Coder 30B Better code quality, parallel tools
Plan Agent Granite 4 H-Tiny Fast, 1M context, tool-capable
Code Review Qwen3 Coder 30B More thorough analysis

Notes

  • Linux-next kernel provides latest AMD GPU support
  • HSA_OVERRIDE_GFX_VERSION=11.0.0 required for ROCm compatibility with Radeon 890M
  • LM Studio v0.3.x with llama.cpp backend
  • All tests used Q4_K_M quantization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment