stormychel/gist:dc1bdfc68bf69a423a61f96740f76159

## gistfile1.txt
Run **Claude Code** (Anthropic's agentic coding tool) with local / open-source models instead of paying for Claude Pro / API credits.

Claude Code officially supports the **Anthropic API** format → tools that emulate it (like Ollama ≥ 0.14 and some LM Studio setups via proxies) allow you to use powerful local models while keeping almost all features: file read/write, command execution, multi-agent workflows, MCP, plugins, etc.

## 1. Ollama (Recommended – Native Support)

Since **Ollama v0.14.0+** (late 2025), there is **official support** for the Anthropic-compatible endpoint.

### Requirements
- Ollama ≥ 0.14.0
- Enough VRAM/RAM for the model you want to use (Qwen 2.5-Coder 32B, DeepSeek-Coder-V2, Llama-3.1/3.2, etc.)
- Claude Code desktop app or CLI

### Quick Setup Steps

1. Update / install Ollama
   ```bash
   # macOS / Linux
   curl -fsSL https://ollama.com/install.sh | sh

   # Check version (must be ≥ 0.14)
   ollama --version
   ```

2. Pull a strong coding model (examples)
   ```bash
   ollama pull qwen2.5-coder:32b-instruct-q6_K     # ~20–24 GB VRAM
   ollama pull deepseek-coder-v2:16b-instruct      # lighter & very good
   ollama pull llama3.1:70b                        # classic strong performer
   ```

3. Start Ollama with the OpenAI/Anthropic-compatible server
   ```bash
   ollama serve
   # or (recommended for stability):
   # OLLAMA_ORIGINS=* ollama serve
   ```

4. In **Claude Code** settings:
   - Change provider → **Custom / Other**
   - Base URL: `http://127.0.0.1:11434/v1`
   - API Key:   `ollama`  (literally just type "ollama" – it's a dummy value)
   - Model:     `qwen2.5-coder:32b-instruct-q6_K` (or whichever you pulled)

5. Done! You should now have full Claude Code behavior running 100% locally.

**Pro tip**: Use tags like `:q5_K_M`, `:q6_K` or `:q8_0` to trade quality vs speed/memory.

## 2. LM Studio (via OpenAI-compatible server)

LM Studio does **not** natively speak the Anthropic API format, but you can make it work with a proxy or directly if you use the **OpenAI** → **Anthropic** translation layer.

### Popular working methods (2026)

**Option A – Use LiteLLM as proxy** (cleanest)

1. Install LiteLLM
   ```bash
   pip install litellm
   ```

2. Start LM Studio server (default port 1234)
3. Run LiteLLM with Anthropic → LM Studio translation
   ```bash
   litellm --model openai/<any-model-name> \
           --api_base http://localhost:1234/v1 \
           --port 8000 \
           --api_key "lm-studio"
   ```

4. In Claude Code → Custom provider:
   - Base URL: `http://localhost:8000`
   - API Key:  `lm-studio` (or whatever you set)
   - Model:    the name you gave in LM Studio

**Option B – Third-party bridges** (Lynkr, ollama-prompt MCP servers, etc.)
Many community MCP / proxy servers exist specifically for Claude Code + LM Studio.

Search for: "Claude Code LM Studio" on GitHub or Reddit for the latest ones.

## Comparison Table

| Feature / Aspect          | Ollama (native)          | LM Studio + proxy        |
|---------------------------|---------------------------|---------------------------|
| Anthropic API support     | Native (best)            | Via LiteLLM / bridge      |
| Setup difficulty          | ★☆☆☆☆ (very easy)        | ★★☆☆☆                     |
| Model switching speed     | Very fast                | Fast                      |
| Token limits              | Only your hardware       | Only your hardware        |
| Cost                      | 100% free                | 100% free                 |
| Privacy                   | Complete (local)         | Complete (local)          |
| Best models right now     | Qwen 2.5 Coder, DeepSeek | Same + easier GGUF search |

## Recommended Starter Models (late 2025 / early 2026)

- **Best overall quality** — `qwen2.5-coder:32b-instruct` (q6_K or q5_K_M)
- **Fast & very capable** — `deepseek-coder-v2:16b-instruct`
- **Good balance** — `codestral:22b`, `llama3.1:70b-instruct`
- **Small but shockingly good** — `qwen2.5-coder:14b` or `phi-4-mini`

## Troubleshooting Tips

- Model hallucinates file paths → try stronger reasoning models or add `--num_ctx 32768`
- Commands fail → make sure Claude Code has filesystem & shell permissions
- Very slow → lower quantization or use smaller model
- "Invalid API key" → double-check you're using exactly `ollama` (Ollama case)

Enjoy local, unlimited, private coding sessions! 🚀

Last updated: Jan 2026
	Run Claude Code (Anthropic's agentic coding tool) with local / open-source models instead of paying for Claude Pro / API credits.

	Claude Code officially supports the Anthropic API format → tools that emulate it (like Ollama ≥ 0.14 and some LM Studio setups via proxies) allow you to use powerful local models while keeping almost all features: file read/write, command execution, multi-agent workflows, MCP, plugins, etc.

	## 1. Ollama (Recommended – Native Support)

	Since Ollama v0.14.0+ (late 2025), there is official support for the Anthropic-compatible endpoint.

	### Requirements
	- Ollama ≥ 0.14.0
	- Enough VRAM/RAM for the model you want to use (Qwen 2.5-Coder 32B, DeepSeek-Coder-V2, Llama-3.1/3.2, etc.)
	- Claude Code desktop app or CLI

	### Quick Setup Steps

	1. Update / install Ollama
	```bash
	# macOS / Linux
	curl -fsSL https://ollama.com/install.sh \| sh

	# Check version (must be ≥ 0.14)
	ollama --version
	```

	2. Pull a strong coding model (examples)
	```bash
	ollama pull qwen2.5-coder:32b-instruct-q6_K # ~20–24 GB VRAM
	ollama pull deepseek-coder-v2:16b-instruct # lighter & very good
	ollama pull llama3.1:70b # classic strong performer
	```

	3. Start Ollama with the OpenAI/Anthropic-compatible server
	```bash
	ollama serve
	# or (recommended for stability):
	# OLLAMA_ORIGINS=* ollama serve
	```

	4. In Claude Code settings:
	- Change provider → Custom / Other
	- Base URL: `http://127.0.0.1:11434/v1`
	- API Key: `ollama` (literally just type "ollama" – it's a dummy value)
	- Model: `qwen2.5-coder:32b-instruct-q6_K` (or whichever you pulled)

	5. Done! You should now have full Claude Code behavior running 100% locally.

	Pro tip: Use tags like `:q5_K_M`, `:q6_K` or `:q8_0` to trade quality vs speed/memory.

	## 2. LM Studio (via OpenAI-compatible server)

	LM Studio does not natively speak the Anthropic API format, but you can make it work with a proxy or directly if you use the OpenAI → Anthropic translation layer.

	### Popular working methods (2026)

	Option A – Use LiteLLM as proxy (cleanest)

	1. Install LiteLLM
	```bash
	pip install litellm
	```

	2. Start LM Studio server (default port 1234)
	3. Run LiteLLM with Anthropic → LM Studio translation
	```bash
	litellm --model openai/<any-model-name> \
	--api_base http://localhost:1234/v1 \
	--port 8000 \
	--api_key "lm-studio"
	```

	4. In Claude Code → Custom provider:
	- Base URL: `http://localhost:8000`
	- API Key: `lm-studio` (or whatever you set)
	- Model: the name you gave in LM Studio

	Option B – Third-party bridges (Lynkr, ollama-prompt MCP servers, etc.)
	Many community MCP / proxy servers exist specifically for Claude Code + LM Studio.

	Search for: "Claude Code LM Studio" on GitHub or Reddit for the latest ones.

	## Comparison Table

	\| Feature / Aspect \| Ollama (native) \| LM Studio + proxy \|
	\|---------------------------\|---------------------------\|---------------------------\|
	\| Anthropic API support \| Native (best) \| Via LiteLLM / bridge \|
	\| Setup difficulty \| ★☆☆☆☆ (very easy) \| ★★☆☆☆ \|
	\| Model switching speed \| Very fast \| Fast \|
	\| Token limits \| Only your hardware \| Only your hardware \|
	\| Cost \| 100% free \| 100% free \|
	\| Privacy \| Complete (local) \| Complete (local) \|
	\| Best models right now \| Qwen 2.5 Coder, DeepSeek \| Same + easier GGUF search \|

	## Recommended Starter Models (late 2025 / early 2026)

	- Best overall quality — `qwen2.5-coder:32b-instruct` (q6_K or q5_K_M)
	- Fast & very capable — `deepseek-coder-v2:16b-instruct`
	- Good balance — `codestral:22b`, `llama3.1:70b-instruct`
	- Small but shockingly good — `qwen2.5-coder:14b` or `phi-4-mini`

	## Troubleshooting Tips

	- Model hallucinates file paths → try stronger reasoning models or add `--num_ctx 32768`
	- Commands fail → make sure Claude Code has filesystem & shell permissions
	- Very slow → lower quantization or use smaller model
	- "Invalid API key" → double-check you're using exactly `ollama` (Ollama case)

	Enjoy local, unlimited, private coding sessions! 🚀

	Last updated: Jan 2026
No results found