AndhikaWB/README.md

## README.md

      
    Raw
  

              README.md
            
          
    This guide is partially based on deekej's HackMD post ¹. I added the ML (machine learning) and container part since the original post doesn't have this info. I tested this on Fedora Kinoite 42.
Initial preparation

Setting the repos

First of all, we need to disable the RPM Fusion Nvidia repo, since it may conflict with package names from negativo17's Nvidia repo later (e.g. akmod-nvidia) ².
sudo sed -ie 's/enabled=1/enabled=0/g' /etc/yum.repos.d/rpmfusion-nonfree-nvidia-driver.repo
Now add negativo17's Nvidia repo ³, and Nvidia container toolkit repo ⁴. The Nvidia container toolkit repo is only needed if you want to use CUDA inside Docker/Podman container later (the CUDA toolkit is not needed ⁵).
EDIT: Apparently, Nvidia container toolkit is also available now on the official repository, with the name golang-github-nvidia-container-toolkit. You don't need to add the toolkit repo if you want to use this package name.
curl -s -L https://negativo17.org/repos/fedora-nvidia.repo | sudo tee /etc/yum.repos.d/fedora-open.repo
# Nvidia container toolkit repo (optional)
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# Refresh the repo metadata
sudo rpm-ostree refresh-md
The advantage of Negativo's Nvidia repo is that you can install the CUDA driver only if you don't need the main Nvidia driver. I compared this with the RPM Fusion repo, and the RPM Fusion repo will at least have missing .so files if we install the CUDA driver only.
Installing layered packages

You can visit the repo websites above to see the available package names to install (or use rpm-ostree search <package-name>). Since I'm going to use CUDA for ML, I need at least these 3 packages: nvidia-driver nvidia-driver-cuda nvidia-container-toolkit (note: use golang-github-nvidia-container-toolkit instead if you don't add the toolkit repo).
The docs from negativo17 also said you can install the CUDA driver only ⁶ (nvidia-driver-cuda nvidia-container-toolkit), and someone on GitHub confirmed it to be working.
And while we're at it, we can install distrobox ⁷ too (optional). Distrobox is usually used for more casual stuff (one container with all the tools), while VS Code's Dev Containers is usually used for isolating each project in a different container (though Distrobox can be used this way too).
Finally, the combined install command (for me) will be like this:
# Install layered packages on top of the system (also with distrobox)
sudo rpm-ostree install nvidia-driver nvidia-driver-cuda nvidia-container-toolkit distrobox
# Alternative if you want to install distrobox locally (not with rpm-ostree)
curl -s https://raw.githubusercontent.com/89luca89/distrobox/release/install | sh -s -- --prefix ~/.local
Then add these parameters below for preventing the default nouveau driver from loading. Even if you use the CUDA driver only, these parameters may still be needed.
# Append kernel parameters
sudo rpm-ostree kargs --append=rd.driver.blacklist=nouveau --append=modprobe.blacklist=nouveau
# You can also use `sudo EDITOR=nano rpm-ostree kargs --editor` for more flexible editing
Note that adding nvidia-drm.modeset=1 is not necessary per this statement ⁸. Check this with sudo cat /sys/module/nvidia_drm/parameters/modeset if you're not sure ⁹.
Check if Nvidia is working correctly

After everything is done, restart the system and check whether Nvidia driver is installed correctly.
# To check current kernel parameters
cat /proc/cmdline
# To check if Nvidia driver is used instead of nouveau
lspci -v
# To check if Nvidia is actually working
nvidia-smi
# To check the installed Nvidia packages (and the dependencies)
rpm-ostree status -v
rpm -qa | grep nvidia
Personal Nvidia issues

Nvidia tools hangs after suspend/restart/shutdown

Nvidia tools (e.g. nvidia-smi) sometimes hangs indefinitely after leaving the system in idle or turning it off, indicating the driver may not work correctly. After searching for a while, it seems this issue affected many of the Turing (NV160) generation GPUs ¹⁰. Normally, the solution is to switch to closed source driver and disable the GSP like below ¹¹:
# Change from "kernel-open" to "kernel"
sed -i -e 's/kernel-open$/kernel/g' /etc/nvidia/kernel.conf
# Disable the GSP firmware
sudo rpm-ostree kargs --append=nvidia.NVreg_EnableGpuFirmware=0
# Rebuild the kernel module (use `rpm-ostree install <file>` if failed)
akmods --rebuild
Then reboot and check your current Nvidia settings with cat /proc/driver/nvidia/params, and modinfo -l nvidia to see the kernel license (MIT/GPL license means you're using the open-source kernel).
However, this doesn't seem to work for me. It doesn't switch to proprietary kernel and the hang still happened. In the end, I gave up, reverted the previous settings, and simply added nvidia-drm.fbdev=0 and/or nvidia-drm.modeset=0 to the kernel parameter instead, which made the Nvidia driver works again.
nvidia-powerd service blocking shutdown

If you have older Nvidia GPU (below Ampere generation), the nvidia-powerd service may not be working correctly as it's unsupported for your GPU ¹². Check the list of Nvidia services by using systemctl list-unit-files | grep nvidia, and if nvidia-powerd exists, check any error using systemctl status nvidia-powerd.
If you noticed error(s) and/or experiencing long shutdown time (due to waiting the daemon to be killed), you can disable this service using systemctl disable nvidia-powerd (and call systemctl stop nvidia-powerd too after that).
You shouldn't force shutdown the system using the power button because Fedora also runs the update script before shutdown. Force shutdown will make the update service not run/applied, and you need to redo whatever changes you made previously on the next boot.
IDE and container preparation

VS Code setup

I don't think it's a good idea to install another layered package (VS Code) to the system, since the overlaying process is already slow enough with the added Nvidia driver. Instead, I'm going to use Flatpak VS Code ¹³, despite the issues ¹⁴ (which is irrelevant since I'm using container anyway).
flatpak install com.visualstudio.code
# Whitelist the /tmp folder to make "Dev Containers" work
flatpak --user override --filesystem=/tmp com.visualstudio.code
# Run VS Code (can also be launched from start menu)
flatpak run com.visualstudio.code
# [Manual] Install "Dev Containers" extension on VS Code

# Add Podman wrapper for Distrobox
mkdir -p ~/.var/app/com.visualstudio.code/data/node_modules/bin
ln -sf /app/bin/host-spawn ~/.var/app/com.visualstudio.code/data/node_modules/bin/bash
ln -sf /app/bin/host-spawn ~/.var/app/com.visualstudio.code/data/node_modules/bin/podman
ln -sf /app/bin/host-spawn ~/.var/app/com.visualstudio.code/data/node_modules/bin/docker-compose
# [Manual] Change the "Dev › Containers: Docker Path" settings on VS Code to "podman"
To make VS Code Flatpak app recognize our executables, the easiest way is to link them via host-spawn like above. Don't worry if /app/bin doesn't exist, that's just how Flatpak sandboxing works internally. As for the missing docker-compose, we can install it via Podman Desktop (Flatpak) later.
Container setup

Although Distrobox said you can use GPU inside the container ¹⁵, you actually need to generate a config file first ⁴ or the GPU won't be detected by the container. If you're using Podman, this is the right command ¹⁶:
# nvidia-ctk is part of nvidia-container-toolkit
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Check the generated devices
nvidia-ctk cdi list
Now create the container using Distrobox (or restart it if you already created a container before). I recommend using a separate home directory since this will directory will be shared directly (with read-write access) to the container. Meaning that installing user libraries in the container will install it to your host home too.
# Create a custom home directory (you can use any directory)
mkdir -p ~/.distrobox/home
# Folders that will be shared with the container
ln -s ~/Documents ~/.distrobox/home/Documents
ln -s ~/Downloads ~/.distrobox/home/Downloads

# Create and download the container (I use Arch btw)
distrobox create --nvidia --name arch --image archlinux:latest --home /home/<user>/.distrobox/home
# Run the container and access the shell (you can also launch from start menu)
distrobox enter arch
If you don't want to use CLI, you can install GUI app like Distroshelf and Podman Desktop from Flatpak.
flatpak install flathub com.ranfdev.DistroShelf
flatpak install flathub io.podman_desktop.PodmanDesktop
Also, for non-distro images such as PostgreSQL and Apache softwares (which is usually the use case when you're using Dev Containers later), you may want to set this config to avoid SELinux and permission issues ¹⁷ (for rootless Podman):
# ~/.config/containers/containers.conf

[containers]
env = ["BUILDAH_FORMAT=docker"]
label = false
userns = "keep-id"
Testing the container

To connect to the existing Distrobox container from VS Code, you can simply select the container from the sidebar, or use Ctrl + Shift + P and  choose "Attach to Running Container...". After that, open your actual project directory (e.g. on the Documents folder) when you're already inside the Distrobox container.
However, if you don't want to Distrobox and prefer Dev Containers instead, you need to create a .devcontainer/devcontainer.json file in your project folder. Refer to containers.dev for the template example, it's pretty easy to setup once you get the hang of it.
I assume you already know the next step. For example, for simply testing PyTorch, you can use the commands below (otherwise I recommend using uv).
# Execute on VS Code (container) terminal/shell
sudo pacman -Sy python
pip install torch --index-url https://download.pytorch.org/whl/cu130
python -c 'import torch; print(torch.cuda.is_available())'
If the output is True, then the GPU is passed correctly to the container. Without PyTorch, you can also simply run nvidia-smi within the container.
Footnotes


https://hackmd.io/@deekej/fedora-silverblue-with-nvidia-drivers-from-negativo17 ↩


https://rpmfusion.org/Howto/NVIDIA#OSTree_.28Silverblue.2FKinoite.2Fetc.29 ↩


https://negativo17.org/nvidia-driver ↩


https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html ↩ ↩²


https://github.com/NVIDIA/nvidia-container-toolkit#getting-started ↩


https://negativo17.org/nvidia-driver/#Configuration_for_CUDA_only_systems ↩


https://distrobox.it ↩


https://negativo17.org/nvidia-driver/#Kernel_modesetting_and_Wayland_support ↩


https://wiki.archlinux.org/title/NVIDIA#DRM_kernel_mode_setting ↩


https://wiki.archlinux.org/title/NVIDIA/Troubleshooting#GSP_firmware ↩


https://discussion.fedoraproject.org/t/heads-up-nvidia-open-kernel-breaks-runtime-d3-on-turing-cards/161643 ↩


https://forums.developer.nvidia.com/t/nvidia-powerd-service-fails-to-start/304927/2 ↩


https://distrobox.it/posts/integrate_vscode_distrobox/#from-flatpak ↩


https://bentsukun.ch/posts/vscode-flatpak ↩


https://distrobox.it/useful_tips/#using-the-gpu-inside-the-container ↩


https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html ↩


https://github.com/flathub/com.visualstudio.code/issues/55 ↩
No results found