Skip to content

Instantly share code, notes, and snippets.

@jeremy-rutman
Last active May 26, 2025 20:36
Show Gist options
  • Select an option

  • Save jeremy-rutman/02243e08f9864ebd81de75a05bfe1eec to your computer and use it in GitHub Desktop.

Select an option

Save jeremy-rutman/02243e08f9864ebd81de75a05bfe1eec to your computer and use it in GitHub Desktop.
tl;dr
use the package manager install (.deb or .rpm files, not .run file)
remove stuff using
sudo apt-get --purge remove 'cuda*'
sudo apt-get --purge -y remove 'nvidia*'
sudo apt-get --purge -y remove 'libnvidia*'
using
jeremy@jeremy-Blade:~$ dpkg -l | grep -i nvidia
make sure everything ius iuninstalled
uninstall steps to avoid 'existing runfile installation already found, it is strongly recommended to remove it'
sudo apt-get purge nvidia-current
sudo apt-get remove --purge nvidia-*
this may also hhelp
sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl
but in my case the cuda10.1 and 10.2 dir have no bin dir , and
$ sudo /usr/bin/nvidia-uninstall doesnt exist
on running install i get
jeremy@jeremy-Blade:~/Downloads$ sudo sh cuda_10.2.89_440.33.01_linux.run.1
Installation failed. See log at /var/log/cuda-installer.log for details.
jeremy@jeremy-Blade:~/Downloads$ more /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 440.33.01
[INFO]: Executing NVIDIA-Linux-x86_64-440.33.01.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-versi
on-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 440.33.01 failed, quitting
dpkg -l | grep -i nvidia revealed
jeremy@jeremy-Blade:~$ dpkg -l | grep -i nvidia
rc cuda-nvtx-10-1 10.1.243-1 amd64 NVIDIA Tools Extension
rc cuda-nvtx-10-2 10.2.89-1 amd64 NVIDIA Tools Extension
rc libnvidia-compute-415:amd64 415.27-0ubuntu0~gpu18.04.2 amd64 NVIDIA libcompute package
rc libnvidia-compute-418:amd64 418.87.01-0ubuntu1 amd64 NVIDIA libcompute package
rc libnvidia-compute-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA libcompute package
sudo apt-get remove --auto-remove nvidia-cuda-toolkit
didnt work - something else was using dpkg - so restarted and tried again
sudo apt-get purge nvidia-cuda-toolkit or sudo apt-get purge --auto-remove nvidia-cuda-toolkit
Additionally, delete the /opt/cuda and ~/NVIDIA_GPU_Computing_SDK folders if they are present. and remove the export PATH=$PATH:/opt/cuda/bin and export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda/lib:/opt/cuda/lib64 lines of the ~/.bash_profile file
this one removed a lot
sudo apt-get --purge remove 'cuda*'
sudo apt-get --purge -y remove 'nvidia*'
sudo reboot
check /var/log/cuda-installer.log
if that shows probs in driver then check
/var/log/nvidia-installer.log
which shows
ERROR: You appear to be running an X server; please exit X before installing. For further details, please see the section INSTALLING TH
E NVIDIA DRIVER in the README available on the Linux driver download page at www.nvidia.com.
so drop out to shell(alt-f4 on my ubuntu18.04) and try
sudo service lightdm stop
unable to load nvidia.drm , also nouveau complaint.
it turns out what i want is the .deb or package file and not the runfile (* It is recommended to use the distribution-specific packages, where possible. )
sudo nano /etc/profile
export PATH=/usr/local/cuda-10.2/bin:/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs${LD_LIBRARY_PATH:+:${LS_LIBRARY_PATH}}
/usr/bin/nvidia-persistenced --verbose
check /var/crash, i found an err where module could not be made
ake[1]-***-no-rule-to-make-target-`modules'-stop-571
I tried
jeremy@jeremy-Blade:~$ ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/libnvidia-ml.so
but still nvidia-smi still gives me
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system
and nvidia-settings gives me
ERROR: Unable to load info from any available system
INFO
jeremy@jeremy-Blade:~$ lspci |grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile] (rev a1)
jeremy@jeremy-Blade:~$ uname -a
Linux jeremy-Blade 5.3.0-26-generic #28~18.04.1-Ubuntu SMP Wed Dec 18 16:40:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
jeremy@jeremy-Blade:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
1312 sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/libnvidia-ml.so
and
sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/cuda/lib64/libnvidia-ml.so
didnt help
finally solved by install using pure package managers, no downloads
CUDNN
Check where your cuda installation is. For the installation from the repository it is /usr/lib/... and /usr/include. Otherwise, it will be /usr/local/cuda/ or /usr/local/cuda-<version>. You can check it with which nvcc or ldconfig -p | grep cuda
Copy the files:
Repository installation:
$ cd folder/extracted/contents
$ sudo cp -P include/cudnn.h /usr/include
$ sudo cp -P lib64/libcudnn* /usr/lib/x86_64-linux-gnu/
$ sudo chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn*
@jeremy-rutman
Copy link
Author

Glad it was of help to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment