Iksas/nvidia_containers.md

## nvidia_containers.md

      
    Raw
  

              nvidia_containers.md
            
          
    Using Nvidia GPUs inside Podman / Docker containers

Install the Nvidia drivers as well as the container toolkit.
Setting up the container

Podman Quadlets

The following settings make Nvidia GPUs available inside Podman Quadlets:
# jellyfin.container

...

[Service]
# Create /dev/nvidia-uvm after rebooting
ExecStartPre=/usr/bin/nvidia-smi

...

[Container]
# Enable access to Nvidia GPUs
# See https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
# See https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
AddDevice=nvidia.com/gpu=all

...

Troubleshooting

The container no longer starts after upgrading the Linux kernel

This can happen when the Nvidia driver has not been compiled for the new Linux kernel version.

  [Optional: Check available drivers]
To check, get your Linux kernel version:
uname -r

List all compiled Nvidia drivers on your machine (takes a few seconds):
sudo dkms status

Check if one of them matches.

Run the following commands to compile the driver for the current kernel:
sudo update-initramfs -k all -u
sudo reboot

If it still does not work, try to nuke and reinstall the driver:
sudo apt purge *nvidia*
sudo apt install nvidia-kernel-dkms nvidia-driver firmware-misc-nonfree
sudo reboot

Error: setting up CDI devices

The following error is displayed while starting up the container:
Error: setting up CDI devices: failed to inject devices: failed to stat CDI host device "/dev/nvidia-uvm": no such file or directory

Apparently, some versions of the Nvidia driver don't automatically create the /dev/nvidia-uvm file after booting.
To fix this, it can be manually created by running the nvidia-smi command once after each boot / before starting containers:
nvidia-smi


  [Alternative solution without nvidia-smi]
The /dev/nvidia-uvm file can also be created without nvidia-smi using the following commands:
  DEVICE_NUMBER="$(grep nvidia-uvm /proc/devices | awk '{print $1}')"
  sudo mknod -m 666 /dev/nvidia-uvm c "$DEVICE_NUMBER" 0
  sudo mknod -m 666 /dev/nvidia-uvm-tools c "$DEVICE_NUMBER" 0

[source]

Test if containers can use the GPU

To test, run the following command:
podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L

If the GPU driver as well as the container toolkit are correctly installed, the output looks like this:
GPU 0: Quadro P600 (UUID: GPU-AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE)
No results found