This document explains how to configure VS Code Copilot Chat to use a local LLM running on your NVIDIA GPU, optimized for Ruby on Rails development.
Run the following commands:
sudo pacman -S nvidia nvidia-utils nvidia-settings linux-headers
Create DRM config:
sudo vim /etc/modprobe.d/nvidia.conf
Add:
options nvidia_drm modeset=1
Rebuild and reboot:
sudo limine-mkinitcpio -P
sudo reboot
Verify:
nvidia-smi
sudo pacman -S ollama-cuda
sudo systemctl enable ollama
sudo systemctl start ollama
Check version:
ollama --version
On hybrid systems, explicitly bind Ollama to the discrete GPU.
Edit the service:
sudo systemctl edit ollama
Add:
[Service]
Environment=CUDA_VISIBLE_DEVICES=0
Reload and restart:
sudo systemctl daemon-reexec
sudo systemctl restart ollama
ollama pull qwen3:14b
# or lighter
ollama pull qwen3:7b
Terminal A:
nvidia-smi -l 1
Terminal B:
ollama run qwen3:14b
You should see ollama using several GB of VRAM.
sudo pacman -S code
In VS Code Extensions, install:
- GitHub Copilot
- GitHub Copilot Chat
Sign in to GitHub.
In VS Code:
- Open Copilot Chat
- Model Dropdown → Manage Models
- Provider: Ollama
- Model: qwen3:14b
| Component | GPU |
|---|---|
| OS / UI | Intel |
| Ollama / LLM | NVIDIA |
| Copilot Chat | Local |
glxinfo | grep "OpenGL renderer"
nvidia-smi
Expected:
- Intel renderer for desktop
ollamavisible during inference
Result:
Fully local, GPU-accelerated Copilot for Ruby on Rails.