We will be covering running prebuilt models and connecting for code assistance. We will not be covering training new models.
- Depends on the models, but in general, Bring the beef.
- A good CPU can and plenty of RAM can do alright, but GPUs will do much better.
- Ryzen 795xx w/ 64GB DDR5@60000hz => 1.5 tokens/sec
- RTX 3090 w/ 24GB DDR6 => 2.8t/s
- A system that can run Docker
Might be easy. Might be hard. Nvidia driver support is spotty. I had to boot installer to graphics safe mode and first boot until I could get the drivers updated. Even had to go to an older kernel version to have a working GPU and network driver.
While it is possible to install Ollama through the OS package manager, we are going to use Docker. This just seems like the most universal method. This way we don't have to worry about any of the dependency versions. We could use a or container-based package manager like Flatpak or Snap, but I want to avoid any sandbox issues. This should work in any Linux distro, MacOS or Windows, though Windows adds an extra layer with WSL and there may be issues getting the GPU to work.
https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
Docker may be some conservative defaults. Let's bump them up a bit. My PC has 32 GB of RAM, so we can use more than the meager ~7.5 GB Docker set by default. Infact, without increasing this limit, I was getting errors trying to run any models. I opened Docker's settings and upped the RAM to ~16GB.
The next issue is that, by default, Docker will not use the GPU. We'll have to install some software to change that. I followed the instructions under Nvidia GPU from the Docker hub here: https://hub.docker.com/r/ollama/ollama. But rather than start the container directly, we'll create a compose file to manage Ollama and the WebUI together.
I was getting a permission denied while trying to connect to the Docker daemon socket... error. I resolved it following these steps: https://www.hostinger.com/tutorials/how-to-fix-docker-permission-denied-error
https://peter-nhan.github.io/posts/Ollama-intro/
https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server
Create an account and login
Let's try a popular one to start with - https://ollama.com/library/deepseek-r1
Others: ollama run incept5/llama3.1-claude ollama run qwen2.5-coder:7b
This is where it gets crazy. # of params, 16/32 bit, contexts https://www.reddit.com/r/SillyTavernAI/comments/1j9jkck/im_an_llm_idiot_confused_by_all_the_options_and/
Get the extension - https://marketplace.visualstudio.com/items?itemName=Continue.continue
Change the apiBase property on the object in the "models" array to wherever you serve Ollama https://docs.continue.dev/reference

