Last active
March 28, 2024 08:54
-
-
Save sergioloppe/6612214581f7d861c9dc9384563b4000 to your computer and use it in GitHub Desktop.
Setup LLaMa-2 in Ubuntu 22.04 for Nvidia RTX 4090
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 1. Check if your Ubuntu is ready for your GPU card. In my case RTX4090 | |
| ``` | |
| nvidia-smi | |
| # If you have an error like "Failed to initialize NVML: Driver/library version mismatch. NVML library version: 535.161" | |
| ubuntu-drivers devices | |
| sudo ubuntu-drivers autoinstall | |
| ## Or as alternative if you want to do it manually | |
| sudo apt-get install nvidia-driver-535 | |
| # Install CUDA (In my case Ubuntu 22.04). You can check here for other options: https://developer.nvidia.com/cuda-downloads | |
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin | |
| sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 | |
| wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.0-550.54.14-1_amd64.deb | |
| sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.0-550.54.14-1_amd64.deb | |
| sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ | |
| sudo apt-get update | |
| sudo apt-get -y install cuda-toolkit-12-4 | |
| ``` | |
| 2. Install the basic tools | |
| ``` | |
| # Add support to large file system | |
| sudo apt install git-lfs make build-essential python3-pip | |
| git lfs install | |
| # Clone repo | |
| git clone https://github.com/facebookresearch/llama | |
| # Clone CPP tools (https://github.com/ggerganov/llama.cpp) | |
| git clone https://github.com/ggerganov/llama.cpp.git | |
| cd llama.cpp | |
| make | |
| # If you want to compile for CUDA you will need | |
| export PATH=/usr/local/cuda/bin:${PATH} | |
| export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH} | |
| make LLAMA_CUDA=1 | |
| # Monitor that GPUs are being using | |
| nvidia-smi pmon | |
| # Install python dependencies (same folder) | |
| python3 -m pip install -r requirements.txt | |
| ``` | |
| 2. While compliling go to [Meta] (https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and download the model's weights. Alternatively you can use huggingface | |
| ``` | |
| pip install huggingface-hub | |
| huggingface-cli login | |
| huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q5_K_M.gguf --local-dir . --local-dir-use-symlinks False | |
| ``` | |
| 3. Let's play with the model. Move the model to the folder `llama.cpp/models` | |
| ``` | |
| # User GCP Only | |
| ./main -t 10 -m models/llama-2-7b-chat.Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\\n### Response:" | |
| # User GPUs | |
| ./main -m models/llama-2-7b-chat.Q5_K_M.gguf --n-gpu-layers 10 --split-mode layer --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\\n### Response:" | |
| # In a more precise way using 24 layers | |
| ./main -m models/llama-2-7b-chat.Q5_K_M.gguf --repeat-penalty 1.1 --n-gpu-layers 24 --split-mode layer -c 4096 --temp 0.7 --prompt "Explain what is the temperature in a LLM model." | |
| # Now using the server (http://127.0.0.1:8888/) | |
| ./server -m models/llama-2-7b-chat.Q5_K_M.gguf --port 8888 --host 0.0.0.0 --ctx-size 10240 --parallel 4 --n-gpu-layers 99 -n 512 | |
| ``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment