Skip to content

Instantly share code, notes, and snippets.

@bbrowning
Last active November 24, 2025 18:27
Show Gist options
  • Select an option

  • Save bbrowning/009470e8accb9215f5df36c81385fffd to your computer and use it in GitHub Desktop.

Select an option

Save bbrowning/009470e8accb9215f5df36c81385fffd to your computer and use it in GitHub Desktop.
Compile recent vLLM builds from source on DGX Spark

Compiling vLLM main from source on DGX Spark

I do all this SSH'd into the DGX Spark from another machine, so everything is terminal commands.

Install python dev dependencies and uv

sudo apt install python3-dev
curl -LsSf https://astral.sh/uv/install.sh | sh

Exit your shell and come back in to have uv in your path

Set env variables

export TORCH_CUDA_ARCH_LIST=12.1a
export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Clone vLLM

mkdir -p ~/src
cd ~/src
git clone https://github.com/vllm-project/vllm.git
cd vllm

Create a new Python venv

uv venv --python 3.12 --seed
source .venv/bin/activate

Install torch for CUDA 13

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

Install newest prereleases of flashinfer and triton

uv pip install xgrammar triton flashinfer-python flashinfer-cubin --prerelease=allow

Setup vLLM to use existing torch

Tell vLLM to use your existing torch and install build dependencies.

python use_existing_torch.py
uv pip install -r requirements/build.txt

Compile vLLM in editable mode

This will take at least 30 minutes or more.

uv pip install -v --no-build-isolation -e .

This should give you a working vLLM that can serve most models. Note that NVFP4 MoE models are still a work in progress, but other FP4 models like gpt-oss-20b should work fine.

Test serving a model

vllm serve openai/gpt-oss-20b \
  --async-scheduling \
  --gpu-memory-utilization 0.4 \
  --tool-call-parser openai \
  --enable-auto-tool-choice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment