Today I found myself needing to set up a minimal SLURM cluster on my laptop for simple testing purposes. As always, there were some challenges along the way and I had to consult multiple installation guides and stackoverflow questions to make everything fit together. Here is what worked for me.
Set up munge
$ sudo apt install munge
Test if it works:
$ munge -n | unmunge
STATUS: Success (0)
[...]
Set up MariaDB
$ sudo apt install mariadb-server
$ sudo mysql -u root
create database slurm_acct_db;
create user 'slurm'@'localhost';
set password for 'slurm'@'localhost' = password('slurmdbpass');
grant usage on *.* to 'slurm'@'localhost';
grant all privileges on slurm_acct_db.* to 'slurm'@'localhost';
flush privileges;
exit
Set up SLURM
$ sudo apt install slurmd slurm-client slurmctld
Use configurator.html to create the SLURM config file. There is one oneline here but it is only useful for the last version.
Find out which version you have (dpkg -l | grep slurm
, mine was 17.11.2). Go to
https://www.schedmd.com/archives.php
and download the package correspond to your version
(ended up with a small version mismatch, worked out anyway).
Unpack and enter directory, then build en run the Configuration Tool
$ cd slurm-17.11.10
$ ./configure
$ make html
$ xdg-open doc/html/configurator.html
- Fill in all NodeName/Hostname field in with own
hostname(1)
. - For testing, fill in
root
for SlurmUser. - Make sure
that the
slurmd
andslurmctld
PID file path are the same as listed in the systemd file (/lib/systemd/system/slurmd.service
). - You might want to look at the Number of CPUs setting
- Copy-paste to
/etc/slurm-llnll/slurm.conf
.
Create a file /etc/slurm-llnl/cgroup.conf
:
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
Restart daemons
sudo systemctl restart slurmctld
sudo systemctl restart slurmd
Running sinfo
should show no errors:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle a715
Test an actual job
Run sleep 1 on 8 processors:
$ time srun -n8 sleep 1
srun -n8 sleep 1 -- 1,20s (0,01s(u) + 0,00s(s) 6kb 0+49 ctx)
Some useful debugging commands
$ slurmctld -D
$ slurmd -D
$ sinfo
Set up mail (optional)
Install postfix
$ sudo apt install postfix mailutils
Edit /etc/postfix/main.cf
and change
inet_interfaces = loopback-only
Restart and test locally.
$ sudo systemctl restart slurmctld
$ sudo systemctl restart slurmd
$ srun --mail-type=ALL hostname
$ cat /var/mail/$(whoami) | grep Subject
If everything went well, you should now have your own private computer cluster up and running :)