nvidia-smi (NVIDIA System Management Interface) is a command-line tool that provides monitoring, management, and diagnostic information for NVIDIA GPU devices.
It communicates directly with the NVIDIA driver and GPU, and can:
- Monitor GPU performance, temperature, and utilization
- Manage power, clock speeds, and ECC
- Control persistence mode and compute modes
- Query detailed metrics for automation and monitoring
nvidia-smiShows a summary table with:
- GPU index, name, and UUID
- Driver & CUDA versions
- GPU & memory utilization
- Power consumption and temperature
- Active processes using the GPU
| Command | Description |
|---|---|
nvidia-smi -h |
Show help and usage information |
nvidia-smi --version |
Display the version of nvidia-smi |
nvidia-smi -L |
List all GPUs with their UUIDs |
nvidia-smi -q |
Display detailed information for all GPUs |
nvidia-smi -i <index> -q |
Display details for a specific GPU |
nvidia-smi -q -d <category> |
Display a specific data category (e.g. TEMPERATURE, PERFORMANCE) |
nvidia-smi -x -q |
Output detailed info in XML format |
nvidia-smi topo -m |
Display topology matrix between GPUs (PCIe/NVLink) |
nvidia-smi -l 5nvidia-smi -lms 500nvidia-smi --filename=/var/log/gpu.log -l 5nvidia-smi dmonExample:
# gpu pwr temp sm mem enc dec mclk pclk
0 85 64 23 5 0 0 405 1110
nvidia-smi --query-gpu=index,name,uuid,temperature.gpu,utilization.gpu,memory.used,memory.total --format=csvOutput:
index, name, uuid, temperature.gpu, utilization.gpu [%], memory.used [MiB], memory.total [MiB]
0, NVIDIA RTX A6000, GPU-02afcc1a-…, 58, 72 %, 13456 MiB, 49152 MiB
| Category | Example Fields |
|---|---|
| Identity | index, name, uuid, serial, vbios_version |
| Performance | utilization.gpu, utilization.memory, pstate, clocks.current.graphics |
| Memory | memory.total, memory.used, memory.free |
| Temperature | temperature.gpu, temperature.memory |
| Power | power.draw, power.limit, power.default_limit |
| Fan/Clocks | fan.speed, clocks.gr, clocks.mem, clocks.video |
| Driver | driver_version, cuda_version |
nvidia-smi --query-gpu=memory.used,memory.total --format=csvnvidia-smi --query-gpu=temperature.gpu,power.draw --format=csv,noheader,nounitsnvidia-smi --query-gpu=name --format=csv,noheadernvidia-smi pmon -c 1nvidia-smi pmonExample output:
# gpu pid type sm mem enc dec command
0 3024 C 23 5 0 0 python3
Terminate a process:
sudo kill -9 <pid>sudo nvidia-smi -pm 1sudo nvidia-smi -pm 0sudo nvidia-smi -pl 250sudo nvidia-smi --lock-gpu-clocks=900,1500sudo nvidia-smi --reset-gpu-clockssudo nvidia-smi --lock-memory-clocks=405,1215sudo nvidia-smi -i 0 --gpu-resetnvidia-smi -q -d ECCsudo nvidia-smi -e 1| Mode | Description |
|---|---|
0 |
Default (Multiple contexts allowed) |
1 |
Exclusive thread |
2 |
Prohibited |
3 |
Exclusive process |
Example:
sudo nvidia-smi -c 3nvidia-smi -q -d SUPPORTED_CLOCKSnvidia-smi -q -d PERFORMANCEnvidia-smi -q -d PCInvidia-smi -q -d FANnvidia-smi --query-gpu=name,utilization.gpu --format=csv,noheadernvidia-smi --query-gpu=temperature.gpu --format=csv,nounitsnvidia-smi --query-gpu=index,uuid,temperature.gpu,power.draw --format=csv -l 10 --filename=gpu_stats.csv- Use
--query-gpuwith--format=csv,noheader,nounitsin scripts. - Use GPU UUIDs for consistent identification.
- Combine with
watch:watch -n 2 nvidia-smi
#!/bin/bash
LOGFILE="/var/log/nvidia_smi_monitor.csv"
echo "timestamp,gpu_index,uuid,utilization.gpu,memory.used,memory.total,temperature.gpu,power.draw" > $LOGFILE
while true; do
nvidia-smi --query-gpu=timestamp,index,uuid,utilization.gpu,memory.used,memory.total,temperature.gpu,power.draw --format=csv,noheader >> $LOGFILE
sleep 5
done| Issue | Solution |
|---|---|
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver |
Ensure driver is loaded or reinstall driver. |
Insufficient permissions |
Add sudo for management commands. |
GPU Reset not supported |
Some models don’t allow GPU reset. |
Compute mode changes not persisting |
Reboot after applying. |