When training ML models, you can observe that Linux out-of-memory (OOM) killer terminates the process in the middle of training, even on high-RAM machines (e.g., 64 GiB RAM). This can be caused by the system running out of RAM.
To solve this, consider enabling ZRAM, a Linux kernel feature that provides compressed swap space in RAM, giving fast, efficient memory overflow handling without relying on slow disk-based swap.
Install ZRAM tools
sudo apt update
sudo apt install zram-toolsConfigure ZRAM
sudo nano /etc/default/zramswapAdd the following:
ENABLED=true
ALGO=zstd
PERCENT=90
PRIORITY=100PERCENT=90 uses up to 90% of total RAM (~58 GiB on a 64 GiB machine) as compressed swap. Sometimes 50% is enpugh. ZRAM swap is used only when classic RAM is exhausted.
Start and enable the ZRAM service:
sudo systemctl enable zramswap
sudo systemctl start zramswapRestart if needed (e.g., after changing config):
sudo systemctl restart zramswapCheck ZRAM is active:
swapon --showOR
sudo zramctlDuring training your model you can monitor real-time memory and swap usage:
watch -n 1 free -h