Skip to content

Instantly share code, notes, and snippets.

@guilsa
Last active October 14, 2025 22:26
Show Gist options
  • Select an option

  • Save guilsa/5490750417d9c0bcfab9cee6edf8870d to your computer and use it in GitHub Desktop.

Select an option

Save guilsa/5490750417d9c0bcfab9cee6edf8870d to your computer and use it in GitHub Desktop.
Open-weight LLM models skills-set

continue.dev

Code Generation:

  • qwen2.5-coder:7b - Excellent for code completion
  • codellama:13b - Strong general coding support
  • deepseek-coder:6.7b - Fast and efficient

Chat & Reasoning:

  • llama3.1:8b - Latest Llama with tool support
  • mistral:7b - Fast and versatile
  • deepseek-r1:32b - Advanced reasoning capabilities

Autocomplete:

  • qwen2.5-coder:1.5b - Lightweight and fast
  • starcoder2:3b - Optimized for code completion

Qwen3

  • math, coding, and reasoning (in mid-range hardware like Macbook Air M4)
  • Specific model:
    • Qwen2.5-7B-Instruct: cited here by LM Studio team as perform well in a wide variety of tool use cases)

Gemma 3

  • multilingual support
  • factual knowledge
  • vision capabilities

LLaMa 3.2

  • basic tool-calling

Misc

Understanding Quantization

Quantization reduces the precision of the model's weights to save memory without significantly affecting its performance.

  • FP16 (No Quantization): best
  • INT8 Quantization: good
  • INT4 Quantization: usable

How to Choose

  • If you have a high-end GPU with plenty of VRAM, try FP16 first.
  • For the best balance of quality and performance, use INT8.
  • If you're on a system with limited resources, INT4 can make large models usable.
  • You can always experiment with different quantization levels to find what works best

Sources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment