At the start, I have:
- CentOS
centos-release-7-7.1908.0.el7.centos.x86_64. - Slurm
- Installed to
/opt/slurm. - Only
slurmctldandslurmdare running.
- Installed to
- Munge.
Now, I am going to configure it to bring the slurmdbd accounting tool with MariaDB.
Install MariaDB via yum.
yum install mariadb-server mariadb-develTo keep it stable, override the innodb_log_file_size configs.
echo -e "[mysqld]\ninnodb_log_file_size=48M" | tee /etc/my.cnf.d/slurm.cnfStart the MariaDB server.
systemctl start mariadb
systemctl enable mariadbIf it works fine, then you can access the MariaDB command line. Let's check if innodb_log_file_size value fits to what we set in the slurm.cnf.
$ sudo mysql -e "SHOW VARIABLES LIKE 'innodb_log_file_size';"
+----------------------+----------+
| Variable_name | Value |
+----------------------+----------+
| innodb_log_file_size | 50331648 |
+----------------------+----------+Looks good.
Next, we need to create the database for accounting data and grant full access to slurm user.
mysql -e "grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'jfh983hjf38hf48f829jhJHG##' with grant option;"
mysql -e "create database slurm_acct_db;"First, create a slurmdbd.conf file and fill it with the corresponding content.
cd /opt/slurm/etc/
touch slurmdbd.conf
chmod 600 slurmdbd.conf
vi slurmdbd.conf # Fill it with the content from attached file `slurmdbd.conf`After that, create an empty log file.
touch /var/log/slurmdbd.log
chmod 600 /var/log/slurmdbd.logNext, we need to create a systemd service file for slurmdbd.
vi /etc/systemd/system/slurmdbd.service # Fill it with the content from attached file `slurmdbd.conf`Reload systemd daemon configs.
systemctl daemon-reloadNow, systemd can see our slurmdbd daemon. Let's start it.
systemctl start slurmdbd.service
systemctl enable slurmdbd.serviceTo enable accounting feature, update the /opt/slurm/etc/slurm.conf file. You need to uncomment and override the following keys.
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2Restart the slurmctld and reload slurm daemons on compute nodes.
systemctl restart slurmctld
scontrol reconfigureFinally, create additional accounting tables in the database. To do that, execute the following command.
sacctmgr -i add cluster <clsuter_name>Where <cluster_name> is the name of your cluster defined in the slurm.conf. There is a shortcut to get it fast.
$ scontrol show config | grep ClusterName
ClusterName = myclsuterDone!
$ sbatch --wrap='sleep 10'
Submitted batch job 111
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
111 wrap compute 1 RUNNING 0:0See the official documentation for details.