The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM. You need to submit your job or application to SLURM with the job script.
The most common operations with SLURM are:
|To check what queues (partitions) are available:||sinfo|
|To submit job:||sbatch <your_job_script>|
|To view the queue status:||squeue|
|To view the queue status of your job:||squeue -u $USER|
|To cancel a running or pending job:||scancel <your_slurm_jobid>|
|To view detailed information of your job:||scontrol show job <your_slurm_jobid>|
Job Script Creation
Suppose you have the application directory at your PI group folder, say $HOME/xgpu-scratch/app/, you can create the SLURM job script there and submit it with sbatch <your_job_script>.
Sample scripts are provided as follows. For different application, please refer to the software page.
Example: create a slurm script to run 2 applications (each application can use 2 CPU cores and 1 GPU device) in parallel.
#!/bin/bash # NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements, # while those starting with "#" and "##SBATCH" are comments. Uncomment # "##SBATCH" line means to remove one # and start with #SBATCH to be a # SLURM command or statement. #SBATCH -J slurm_job #Slurm job name # Set the maximum runtime, uncomment if you need it ##SBATCH -t 48:00:00 #Maximum runtime of 48 hours # Enable email notificaitons when job begins and ends, uncomment if you need it ##SBATCH --firstname.lastname@example.org #Update your email address ##SBATCH --mail-type=begin ##SBATCH --mail-type=end # Choose partition (queue) "x-gpu" or "x-gpu-share" #SBATCH -p x-gpu-share # To use 4 cpu cores and 2 rtx2080ti gpu devices in a node -- gpu resources can be gpu:rtx2080ti or gpu:rtx6000 #SBATCH -N 1 -n 4 --gres=gpu:rtx2080ti:2 # Setup runtime environment if necessary # or you can source ~/.bashrc or ~/.bash_profile # Go to the job submission directory and run your application cd $HOME/xgpu-scratch/app/ # Execute applications in parallel srun -n 2 --gres=gpu:rtx2080ti:1 myapp1 & # Assign 2 CPU cores and 1 GPU device to run application "myapp1" srun -n 2 --gres=gpu:rtx2080ti:1 myapp2 # Similarly, assign 2 CPU cores and 1 GPU device to run application "myapp2" wait
For the #SBATCH options, please consult the manpage and its manpage on web.
The standard output of the job will be saved as “slurm-<your_slurm_jobid>.out” at the job submission directory.
For details on the available partitions and their resource limits, please refer here.