Code Generation:
qwen2.5-coder:7b- Excellent for code completioncodellama:13b- Strong general coding supportdeepseek-coder:6.7b- Fast and efficient
Chat & Reasoning:
llama3.1:8b- Latest Llama with tool supportmistral:7b- Fast and versatiledeepseek-r1:32b- Advanced reasoning capabilities
Autocomplete:
qwen2.5-coder:1.5b- Lightweight and faststarcoder2:3b- Optimized for code completion
- math, coding, and reasoning (in mid-range hardware like Macbook Air M4)
- Specific model:
Qwen2.5-7B-Instruct: cited here by LM Studio team as perform well in a wide variety of tool use cases)
- multilingual support
- factual knowledge
- vision capabilities
- basic tool-calling
Quantization reduces the precision of the model's weights to save memory without significantly affecting its performance.
- FP16 (No Quantization): best
- INT8 Quantization: good
- INT4 Quantization: usable
- If you have a high-end GPU with plenty of VRAM, try FP16 first.
- For the best balance of quality and performance, use INT8.
- If you're on a system with limited resources, INT4 can make large models usable.
- You can always experiment with different quantization levels to find what works best
Sources: