Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save FNGarvin/fda75b78483d19878ab6ab96297a88c9 to your computer and use it in GitHub Desktop.

Select an option

Save FNGarvin/fda75b78483d19878ab6ab96297a88c9 to your computer and use it in GitHub Desktop.
HOWTO: Automate NVIDIA CDI Refresh for Podman on WSL2

HOWTO: Automate NVIDIA CDI Refresh for Podman on WSL2

Why this is needed

Updating NVIDIA drivers on Windows often breaks the bridge to WSL2 containers. Because the Container Device Interface (CDI) file (/etc/cdi/nvidia.yaml) is a static snapshot of the driver state at the time of generation, any driver update renders the snapshot invalid. This usually results in Podman containers failing to find the GPU with jarring, cryptic, or non-existent error messages.

The Benefit

This solution provides "zero-touch" insurance. It checks your driver version at boot and only regenerates the CDI if a change is detected. By using a systemd service, it runs asynchronously in the background, adding negligible overhead (approx. 80ms) to your WSL startup time (zero per-shell latency).

Prerequisites

  • WSL Version: 0.58.0 or higher (Check via wsl --version in PowerShell).
  • Systemd Enabled: Your /etc/wsl.conf must contain:
    [boot]
    systemd=true
  • NVIDIA Container Toolkit: Installed within your WSL distribution.

Step 1: The Logic Script

This script compares the current Windows driver version (via the DXG bridge) against a cached version. It only triggers a regeneration when a mismatch is found.

File: /usr/local/bin/refresh-nvidia-cdi.sh

#!/bin/bash

# Configuration
CDI_OUT="/etc/cdi/nvidia.yaml"
VERSION_CACHE="/var/tmp/nvidia-driver-version.cache"
SMI_PATH="/usr/lib/wsl/lib/nvidia-smi"
LOG="/var/tmp/nvidia-cdi-debug.log"

# 1. Ensure the output directory exists
mkdir -p /etc/cdi

# 2. Query the driver version via the WSL-mapped nvidia-smi
if [ -x "$SMI_PATH" ]; then
    CURRENT_VERSION=$($SMI_PATH --query-gpu=driver_version --format=csv,noheader 2>/dev/null | head -n 1)
else
    # Exit quietly if GPU bridge isn't initialized yet
    exit 0 
fi

# 3. Only regenerate if the driver version has changed or cache is missing
LAST_VERSION=$(cat $VERSION_CACHE 2>/dev/null)

if [ "$CURRENT_VERSION" != "$LAST_VERSION" ]; then
    echo "$(date): Driver change detected ($LAST_VERSION -> $CURRENT_VERSION). Updating CDI..." > $LOG
    if /usr/bin/nvidia-ctk cdi generate --mode auto --output="$CDI_OUT" >> $LOG 2>&1; then
        echo "$CURRENT_VERSION" > "$VERSION_CACHE"
        echo "SUCCESS: CDI generated at $CDI_OUT" >> $LOG
    else
        echo "ERROR: nvidia-ctk failed to generate CDI." >> $LOG
        exit 1
    fi
fi

Make it executable:

sudo chmod +x /usr/local/bin/refresh-nvidia-cdi.sh

Step 2: The Systemd Service

This service ensures the script runs automatically in the background as soon as the Windows GPU bridge (/dev/dxg) is detected by the Linux kernel.

File: /etc/systemd/system/nvidia-cdi-refresh.service

[Unit]
Description=Refresh NVIDIA CDI when GPU is ready
ConditionPathExists=/dev/dxg
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/refresh-nvidia-cdi.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Enable the service:

sudo systemctl daemon-reload
sudo systemctl enable nvidia-cdi-refresh.service

Appendix: How to Test

You don't need to wait for a driver update to verify the automation.

  1. Simulate a Change: Delete the version cache: sudo rm /var/tmp/nvidia-driver-version.cache
  2. Trigger the Service: sudo systemctl restart nvidia-cdi-refresh.service
  3. Verify Results:
    • Check if the cache was recreated: cat /var/tmp/nvidia-driver-version.cache
    • Check the CDI file timestamp: ls -l /etc/cdi/nvidia.yaml
    • Review the debug log: cat /var/tmp/nvidia-cdi-debug.log
  4. Confirm Persistence: Run wsl --shutdown from PowerShell, restart WSL, and check the log timestamp again to ensure it fired at boot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment