Skip to content

Instantly share code, notes, and snippets.

@stormychel
Created January 26, 2026 13:51
Show Gist options
  • Select an option

  • Save stormychel/dc1bdfc68bf69a423a61f96740f76159 to your computer and use it in GitHub Desktop.

Select an option

Save stormychel/dc1bdfc68bf69a423a61f96740f76159 to your computer and use it in GitHub Desktop.
Claude Code with Local Backends (Ollama & LM Studio)
Run **Claude Code** (Anthropic's agentic coding tool) with local / open-source models instead of paying for Claude Pro / API credits.
Claude Code officially supports the **Anthropic API** format → tools that emulate it (like Ollama ≥ 0.14 and some LM Studio setups via proxies) allow you to use powerful local models while keeping almost all features: file read/write, command execution, multi-agent workflows, MCP, plugins, etc.
## 1. Ollama (Recommended – Native Support)
Since **Ollama v0.14.0+** (late 2025), there is **official support** for the Anthropic-compatible endpoint.
### Requirements
- Ollama ≥ 0.14.0
- Enough VRAM/RAM for the model you want to use (Qwen 2.5-Coder 32B, DeepSeek-Coder-V2, Llama-3.1/3.2, etc.)
- Claude Code desktop app or CLI
### Quick Setup Steps
1. Update / install Ollama
```bash
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Check version (must be ≥ 0.14)
ollama --version
```
2. Pull a strong coding model (examples)
```bash
ollama pull qwen2.5-coder:32b-instruct-q6_K # ~20–24 GB VRAM
ollama pull deepseek-coder-v2:16b-instruct # lighter & very good
ollama pull llama3.1:70b # classic strong performer
```
3. Start Ollama with the OpenAI/Anthropic-compatible server
```bash
ollama serve
# or (recommended for stability):
# OLLAMA_ORIGINS=* ollama serve
```
4. In **Claude Code** settings:
- Change provider → **Custom / Other**
- Base URL: `http://127.0.0.1:11434/v1`
- API Key: `ollama` (literally just type "ollama" – it's a dummy value)
- Model: `qwen2.5-coder:32b-instruct-q6_K` (or whichever you pulled)
5. Done! You should now have full Claude Code behavior running 100% locally.
**Pro tip**: Use tags like `:q5_K_M`, `:q6_K` or `:q8_0` to trade quality vs speed/memory.
## 2. LM Studio (via OpenAI-compatible server)
LM Studio does **not** natively speak the Anthropic API format, but you can make it work with a proxy or directly if you use the **OpenAI** → **Anthropic** translation layer.
### Popular working methods (2026)
**Option A – Use LiteLLM as proxy** (cleanest)
1. Install LiteLLM
```bash
pip install litellm
```
2. Start LM Studio server (default port 1234)
3. Run LiteLLM with Anthropic → LM Studio translation
```bash
litellm --model openai/<any-model-name> \
--api_base http://localhost:1234/v1 \
--port 8000 \
--api_key "lm-studio"
```
4. In Claude Code → Custom provider:
- Base URL: `http://localhost:8000`
- API Key: `lm-studio` (or whatever you set)
- Model: the name you gave in LM Studio
**Option B – Third-party bridges** (Lynkr, ollama-prompt MCP servers, etc.)
Many community MCP / proxy servers exist specifically for Claude Code + LM Studio.
Search for: "Claude Code LM Studio" on GitHub or Reddit for the latest ones.
## Comparison Table
| Feature / Aspect | Ollama (native) | LM Studio + proxy |
|---------------------------|---------------------------|---------------------------|
| Anthropic API support | Native (best) | Via LiteLLM / bridge |
| Setup difficulty | ★☆☆☆☆ (very easy) | ★★☆☆☆ |
| Model switching speed | Very fast | Fast |
| Token limits | Only your hardware | Only your hardware |
| Cost | 100% free | 100% free |
| Privacy | Complete (local) | Complete (local) |
| Best models right now | Qwen 2.5 Coder, DeepSeek | Same + easier GGUF search |
## Recommended Starter Models (late 2025 / early 2026)
- **Best overall quality** — `qwen2.5-coder:32b-instruct` (q6_K or q5_K_M)
- **Fast & very capable** — `deepseek-coder-v2:16b-instruct`
- **Good balance** — `codestral:22b`, `llama3.1:70b-instruct`
- **Small but shockingly good** — `qwen2.5-coder:14b` or `phi-4-mini`
## Troubleshooting Tips
- Model hallucinates file paths → try stronger reasoning models or add `--num_ctx 32768`
- Commands fail → make sure Claude Code has filesystem & shell permissions
- Very slow → lower quantization or use smaller model
- "Invalid API key" → double-check you're using exactly `ollama` (Ollama case)
Enjoy local, unlimited, private coding sessions! 🚀
Last updated: Jan 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment