Job scheduling system (Slurm)

The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM. You need to submit your job or application to SLURM with the job script.

The most common operations with SLURM are:

Purpose Command
To check what partitions (queues) are available: sinfo
To submit job: sbatch <your_job_script>
To view the queue status: squeue
To view the queue status of your job: squeue -u $USER
To cancel a running or pending job: scancel <your_slurm_jobid>
To view detailed information of your job: scontrol show job <your_slurm_jobid>

Job Script Creation

Suppose you have the application directory at your home, say $HOME/apps/slurm, you can create the SLURM job script there and submit it with sbatch <your_job_script>.

Sample scripts are provided as follows. For different application, please refer to the software page.

Example 1: create a slurm script for a mpich2 application (can use 48 CPU cores) compiled with PGI compiler.

 

#!/bin/bash

# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
#       while those starting with "#" and "##SBATCH" are comments.  Uncomment
#       "##SBATCH" line means to remove one # and start with #SBATCH to be a
#       SLURM command or statement.


#SBATCH -J slurm_job #Slurm job name

# Set the maximum runtime, uncomment if you need it
##SBATCH -t 48:00:00 #Maximum runtime of 48 hours

# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end

# Choose partition (queue) "cpu" or "cpu-share" or replace with partition you can access
#SBATCH -p cpu-share

# Use 2 nodes and 80 cores
#SBATCH -N 2 -n 80

# Setup runtime environment if necessary
# For example, setup intel MPI environment
module swap gnu8 intel

# Go to the job submission directory and run your application
cd $HOME/apps/slurm
mpirun ./your_mpi_application

 

Example 2: create a slurm script to run 3 applications (each application can only use 1 CPU core) in parallel.

 

#!/bin/bash

# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
#       while those starting with "#" and "##SBATCH" are comments.  Uncomment
#       "##SBATCH" line means to remove one # and start with #SBATCH to be a
#       SLURM command or statement.


#SBATCH -J slurm_job #Slurm job name

# Set the maximum runtime, uncomment if you need it
##SBATCH -t 48:00:00 #Maximum runtime of 48 hours

# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end

# Choose partition (queue) "cpu" or "cpu-share" or replace with partition you can access
#SBATCH -p cpu-share

# Use 1 node and 3 cores 
#SBATCH -N 1 -n 3 

# Setup runtime environment if necessary 
# or you can source ~/.bashrc or ~/.bash_profile 

# Go to the job submission directory and run your application 
cd $HOME/apps/slurm 
# Execute applications in parallel 
srun -n 1 myapp1 &    # Assign 1 core to run application "myapp1" 
srun -n 1 myapp2 &    # Similarly, assign 1 core to run application "myapp2" 
srun -n 1 myapp3 
wait

			

Example 3: create a slurm script to run 2 applications (each application can use 2 CPU cores and 1 GPU device) in parallel.

 

#!/bin/bash

# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
#       while those starting with "#" and "##SBATCH" are comments.  Uncomment
#       "##SBATCH" line means to remove one # and start with #SBATCH to be a
#       SLURM command or statement.


#SBATCH -J slurm_job #Slurm job name

# Set the maximum runtime, uncomment if you need it
##SBATCH -t 48:00:00 #Maximum runtime of 48 hours

# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end

# Choose partition (queue) "gpu" or "gpu-share" or replace with partition you can access
#SBATCH -p gpu-share

# To use 4 cpu cores and 2 gpu devices in a node
#SBATCH -N 1 -n 4 --gres=gpu:2

# Setup runtime environment if necessary 
# or you can source ~/.bashrc or ~/.bash_profile 

# Go to the job submission directory and run your application 
cd $HOME/apps/slurm 
# Execute applications in parallel 
srun -n 2 --gres=gpu:1 myapp1 &    # Assign 2 CPU cores and 1 GPU device to run application "myapp1" 
srun -n 2 --gres=gpu:1 myapp2     # Similarly, assign 2 CPU cores and 1 GPU device to run application "myapp2" 
wait

For the #SBATCH options, please consult the manpage and its manpage on web.

The standard output of the job will be saved as “slurm-<your_slurm_jobid>.out” at the job submission directory.

For details on the available partitions and their resource limits, please refer here.