This guide details how to set up a LLaMA.cpp HTTP Server with GPU acceleration on a fresh install of Windows 11 (25H2). With immensely smarter frontier AI models available exclusively online, because most of us cannot afford the hardware needed to run them locally, there are few reasons to run local LLMs. But I have found that tinkering with the runtime config of local LLMs is the best way to learn how these models work. And they serve as a tool that a smarter AI agent can operate, reducing token usage of more expensive models. Finally it puts my GeForce RTX 5090 GPU to work, when not running Rocket League.
Press Windows Key + R, type cmd, and press Enter to open a black window running Windows CMD, a command-line interface (CLI) that
has existed in Windows since 1987. It will probably never die. Copy and paste the following command into the CLI and press Enter to
install the tools we'll need. If this is the first time you're