Job scheduling system (Slurm)

The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM. You need to submit your job or application to SLURM with the job script.

The most common operations with SLURM are:

Purpose Command
To check what queues (partitions) are available: sinfo
To submit job: sbatch <your_job_script>
To view the queue status: squeue
To view the queue status of your job: squeue -u $USER
To cancel a running or pending job: scancel <your_slurm_jobid>
To view detailed information of your job: scontrol show job <your_slurm_jobid>

Job Script Creation

Suppose you have the application directory at your PI group folder, say $HOME/xgpu-scratch/app/, you can create the SLURM job script there and submit it with sbatch <your_job_script>.

Sample scripts are provided as follows. For different application, please refer to the software page.

Example: create a slurm script to run 2 applications (each application can use 2 CPU cores and 1 GPU device) in parallel.

#!/bin/bash

# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
#       while those starting with "#" and "##SBATCH" are comments.  Uncomment
#       "##SBATCH" line means to remove one # and start with #SBATCH to be a
#       SLURM command or statement.


#SBATCH -J slurm_job #Slurm job name

# Set the maximum runtime, uncomment if you need it
##SBATCH -t 48:00:00 #Maximum runtime of 48 hours

# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end

# Choose partition (queue) "x-gpu" or "x-gpu-share"
#SBATCH -p x-gpu-share

# To use 4 cpu cores and 2 rtx2080ti gpu devices in a node -- gpu resources can be gpu:rtx2080ti or gpu:rtx6000
#SBATCH -N 1 -n 4 --gres=gpu:rtx2080ti:2

# Setup runtime environment if necessary 
# or you can source ~/.bashrc or ~/.bash_profile 

# Go to the job submission directory and run your application 
cd $HOME/xgpu-scratch/app/
# Execute applications in parallel 
srun -n 2 --gres=gpu:rtx2080ti:1 myapp1 &    # Assign 2 CPU cores and 1 GPU device to run application "myapp1" 
srun -n 2 --gres=gpu:rtx2080ti:1 myapp2      # Similarly, assign 2 CPU cores and 1 GPU device to run application "myapp2" 
wait

 

For the #SBATCH options, please consult the manpage and its manpage on web.

The standard output of the job will be saved as “slurm-<your_slurm_jobid>.out” at the job submission directory.

For details on the available partitions and their resource limits, please refer here.