guilsa/llm_use_cases.md

## llm_use_cases.md

      
    Raw
  

              llm_use_cases.md
            
          
    continue.dev

Code Generation:

qwen2.5-coder:7b - Excellent for code completion
codellama:13b - Strong general coding support
deepseek-coder:6.7b - Fast and efficient

Chat & Reasoning:

llama3.1:8b - Latest Llama with tool support
mistral:7b - Fast and versatile
deepseek-r1:32b - Advanced reasoning capabilities

Autocomplete:

qwen2.5-coder:1.5b - Lightweight and fast
starcoder2:3b - Optimized for code completion

Qwen3


math, coding, and reasoning (in mid-range hardware like Macbook Air M4)
Specific model:

Qwen2.5-7B-Instruct: cited here by LM Studio team as perform well in a wide variety of tool use cases)


Gemma 3


multilingual support
factual knowledge
vision capabilities

LLaMa 3.2


basic tool-calling

Misc

Understanding Quantization

Quantization reduces the precision of the model's weights to save memory without significantly affecting its performance.

FP16 (No Quantization): best
INT8 Quantization: good
INT4 Quantization: usable

How to Choose


If you have a high-end GPU with plenty of VRAM, try FP16 first.
For the best balance of quality and performance, use INT8.
If you're on a system with limited resources, INT4 can make large models usable.
You can always experiment with different quantization levels to find what works best

Sources:

https://www.reddit.com/r/LocalLLaMA/comments/1kau30f/qwen3_vs_gemma_3/
https://simonwillison.net/2025/Aug/10/qwen3-4b/
https://spin.atomicobject.com/running-local-llms/
https://www.caniusellm.com/
https://docs.continue.dev/guides/ollama-guide#what-are-the-best-practices-for-ollama
No results found