Created
January 26, 2026 13:51
-
-
Save stormychel/dc1bdfc68bf69a423a61f96740f76159 to your computer and use it in GitHub Desktop.
Claude Code with Local Backends (Ollama & LM Studio)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Run **Claude Code** (Anthropic's agentic coding tool) with local / open-source models instead of paying for Claude Pro / API credits. | |
| Claude Code officially supports the **Anthropic API** format → tools that emulate it (like Ollama ≥ 0.14 and some LM Studio setups via proxies) allow you to use powerful local models while keeping almost all features: file read/write, command execution, multi-agent workflows, MCP, plugins, etc. | |
| ## 1. Ollama (Recommended – Native Support) | |
| Since **Ollama v0.14.0+** (late 2025), there is **official support** for the Anthropic-compatible endpoint. | |
| ### Requirements | |
| - Ollama ≥ 0.14.0 | |
| - Enough VRAM/RAM for the model you want to use (Qwen 2.5-Coder 32B, DeepSeek-Coder-V2, Llama-3.1/3.2, etc.) | |
| - Claude Code desktop app or CLI | |
| ### Quick Setup Steps | |
| 1. Update / install Ollama | |
| ```bash | |
| # macOS / Linux | |
| curl -fsSL https://ollama.com/install.sh | sh | |
| # Check version (must be ≥ 0.14) | |
| ollama --version | |
| ``` | |
| 2. Pull a strong coding model (examples) | |
| ```bash | |
| ollama pull qwen2.5-coder:32b-instruct-q6_K # ~20–24 GB VRAM | |
| ollama pull deepseek-coder-v2:16b-instruct # lighter & very good | |
| ollama pull llama3.1:70b # classic strong performer | |
| ``` | |
| 3. Start Ollama with the OpenAI/Anthropic-compatible server | |
| ```bash | |
| ollama serve | |
| # or (recommended for stability): | |
| # OLLAMA_ORIGINS=* ollama serve | |
| ``` | |
| 4. In **Claude Code** settings: | |
| - Change provider → **Custom / Other** | |
| - Base URL: `http://127.0.0.1:11434/v1` | |
| - API Key: `ollama` (literally just type "ollama" – it's a dummy value) | |
| - Model: `qwen2.5-coder:32b-instruct-q6_K` (or whichever you pulled) | |
| 5. Done! You should now have full Claude Code behavior running 100% locally. | |
| **Pro tip**: Use tags like `:q5_K_M`, `:q6_K` or `:q8_0` to trade quality vs speed/memory. | |
| ## 2. LM Studio (via OpenAI-compatible server) | |
| LM Studio does **not** natively speak the Anthropic API format, but you can make it work with a proxy or directly if you use the **OpenAI** → **Anthropic** translation layer. | |
| ### Popular working methods (2026) | |
| **Option A – Use LiteLLM as proxy** (cleanest) | |
| 1. Install LiteLLM | |
| ```bash | |
| pip install litellm | |
| ``` | |
| 2. Start LM Studio server (default port 1234) | |
| 3. Run LiteLLM with Anthropic → LM Studio translation | |
| ```bash | |
| litellm --model openai/<any-model-name> \ | |
| --api_base http://localhost:1234/v1 \ | |
| --port 8000 \ | |
| --api_key "lm-studio" | |
| ``` | |
| 4. In Claude Code → Custom provider: | |
| - Base URL: `http://localhost:8000` | |
| - API Key: `lm-studio` (or whatever you set) | |
| - Model: the name you gave in LM Studio | |
| **Option B – Third-party bridges** (Lynkr, ollama-prompt MCP servers, etc.) | |
| Many community MCP / proxy servers exist specifically for Claude Code + LM Studio. | |
| Search for: "Claude Code LM Studio" on GitHub or Reddit for the latest ones. | |
| ## Comparison Table | |
| | Feature / Aspect | Ollama (native) | LM Studio + proxy | | |
| |---------------------------|---------------------------|---------------------------| | |
| | Anthropic API support | Native (best) | Via LiteLLM / bridge | | |
| | Setup difficulty | ★☆☆☆☆ (very easy) | ★★☆☆☆ | | |
| | Model switching speed | Very fast | Fast | | |
| | Token limits | Only your hardware | Only your hardware | | |
| | Cost | 100% free | 100% free | | |
| | Privacy | Complete (local) | Complete (local) | | |
| | Best models right now | Qwen 2.5 Coder, DeepSeek | Same + easier GGUF search | | |
| ## Recommended Starter Models (late 2025 / early 2026) | |
| - **Best overall quality** — `qwen2.5-coder:32b-instruct` (q6_K or q5_K_M) | |
| - **Fast & very capable** — `deepseek-coder-v2:16b-instruct` | |
| - **Good balance** — `codestral:22b`, `llama3.1:70b-instruct` | |
| - **Small but shockingly good** — `qwen2.5-coder:14b` or `phi-4-mini` | |
| ## Troubleshooting Tips | |
| - Model hallucinates file paths → try stronger reasoning models or add `--num_ctx 32768` | |
| - Commands fail → make sure Claude Code has filesystem & shell permissions | |
| - Very slow → lower quantization or use smaller model | |
| - "Invalid API key" → double-check you're using exactly `ollama` (Ollama case) | |
| Enjoy local, unlimited, private coding sessions! 🚀 | |
| Last updated: Jan 2026 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment