The Simple Linux Utility for Resource Management (SLURM) serves as the designated resource management and job scheduling system within the cluster. It is mandatory for all jobs within the cluster to be executed through SLURM. To initiate a job or application, it is imperative to submit a job script to SLURM.
A SLURM script encompasses three essential aspects:
- Prescribing the resource requirements for the job: The script explicitly defines the necessary resources and specifications required for the successful execution of the job, such as CPU cores, memory allocation, time limits, and any other relevant constraints.
- Setting the environment: The script ensures the establishment of an appropriate and tailored execution environment for the job by configuring variables, module dependencies, paths, and other relevant settings necessary for seamless execution.
- Specifying the work to be carried out: The script outlines the specific tasks and procedures to be executed in the form of shell commands. It provides a clear and concise set of instructions that guide the execution of the job, enabling the system to carry out the desired computational operations and produce the intended results.
Common operations with SLURM
Purpose | Command |
---|---|
To check what queues (partitions) are available: | sinfo |
To submit job: | sbatch <your_job_script> |
To view the queue status: | squeue |
To view the queue status of your job: | squeue -u $USER |
To cancel a running or pending job: | scancel <your_slurm_jobid> |
To view detailed information of your job: | scontrol show job <your_slurm_jobid> |
Common SLURM Job Directives
Purpose | Options | Examples |
---|---|---|
Job's name defined by user | --job-name | --job-name=myjob |
Partition of the job allocated (intel/amd/gpu-a30/gpu-l20) |
--partition |
--partition=intel *Note: When selecting gpu partitions, you must also use options like --gpus-per-node to obtain at least one GPU. |
Account to be charged for resources used | --account | --account=mygroup |
Max execution time (Walltime) | --time=D-HH:MM:SS | --time=1-01:10:30 |
Nodes required | --nodes | --nodes=2 |
Number of tasks (MPI workers) per node | --ntasks-per-node | --ntasks-per-node=4 |
Number of CPUs (OMP threads) per task | --cpus-per-task | --cpus-per-task=64 |
Number of CPUs (OMP threads) per gpu | --cpus-per-gpu | --cpus-per-gpu=16 |
GPUs per node | --gpus-per-node | --gpus-per-node=4 |
GPUs per task | --gpus-per-task | --gpus-per-task=1 |
Quality of Service | --qos | --qos=debug |
You may check additional options here.
Use case example GPU(Simple python job)
The demo python is shown here for convenience, save it as matrix_inverse.py under your working directory:
import numpy as np print("N=3") N = 3 X = np.random.randn(N, N) print("X =\n", X) print("Inverse(X) =\n", np.linalg.inv(X)) print("N=10") N = 10 X = np.random.randn(N, N) print("X =\n", X) print("Inverse(X) =\n", np.linalg.inv(X)) print("N=100") N = 100 X = np.random.randn(N, N) print("X =\n", X) print("Inverse(X) =\n", np.linalg.inv(X))
#!/bin/bash #SBATCH --job-name=matinv # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks-per-node=1 # number of tasks per node (adjust when using MPI) #SBATCH --cpus-per-task=4 # cpu-cores per task (>1 if multi-threaded tasks, adjust when using OMP) #SBATCH --gpus-per-node=1 # Number of GPUs for the task #SBATCH --time=01:20:00 # total run time limit (HH:MM:SS) #SBATCH --partition=gpu-a30 #The partition(queue) where you submit #SBATCH --account=<projectgroupname> #Specify project group account ### Your commands for the tasks nvidia-smi python matrix_inverse.py ###############################
- The first line of a Slurm script specifies the Unix shell to be used.
- This is followed by a series of #SBATCH directives which set the resource requirements and other parameters of the job.
- --nodes is for number of nodes for the task.
- --ntasks-per-node is for setting number of tasks per node (usually means MPI ranks).
- --cpus-per-task is for setting number of threads per task (usually means OMP threads). Some libraries, for example python's numpy, will be affected by this option. For jobs with GPU, you may use --cpus-per-gpu.
- --gpus-per-node is for selecting number of gpus per node.You can also specify the GPU to use with the option --gpus-per-node=a30:4. However, currently, we only have homogeneous machines, meaning all GPUs are of the same type on a single node. Sometimes, --gpus-per-task would be useful for allocate gpus to tasks. Using --gres is possible but not recommanded.
- --partition is for selecting the partition for submission.
- The necessary changes to the environment are made by loading the python module.
- Lastly, the work to be done, which is the execution of a Python script, is specified in the final line.
- Run "sbatch <filename>" to submit job.
- "squeue" enable user to check job status.
- "scancel <jobid>" to cancel a job.
- "sinfo" to check node availability.
Use case example CPU(Simple python job)
#!/bin/bash #SBATCH --job-name=matinv # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks-per-node=1 # number of tasks per node (adjust when using MPI) #SBATCH --cpus-per-task=128 # cpu-cores per task (>1 if multi-threaded tasks, adjust when using OMP) #SBATCH --time=01:20:00 # total run time limit (HH:MM:SS) #SBATCH --partition=intel # The partition(queue: intel/amd/gpu-a30/gpu-l20) where you submit #SBATCH --account=<pi_account_name> #Specify project group account ### Your commands for the tasks python matrix_inverse.py ###############################
Result of squeue:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 522 intel matinv kcalexla R 5:17 1 cpu01
- Use "cat" to print output from slurm-jobid.out
- You may download the following tools for Job script generation generator(https://github.com/BYUHPC/BYUJobScriptGenerator.git)
N=3 X = [[-0.74054344 -0.33695325 -1.80687036] [-0.23310079 0.41634362 2.12752795] [-1.43863402 -0.96117331 0.38851044]] Inverse(X) = [[-1.04068167 -0.88078301 -0.01669555] [ 1.40075039 1.3615892 -0.94165995] [-0.3881393 0.10707253 0.1824476 ]]
Additional Slurm script examples
Example 1: create a slurm script to run 2 applications (each application can use 2 CPU cores and 1 GPU device) in parallel.
#!/bin/bash # NOTE: Lines starting with "#SBATCH" are valid SLURM options. # Lines starting with "#" and "##SBATCH" are comments. # Uncomment a "##SBATCH" line (i.e. remove one #) to #SBATCH # means turn a comment to a SLURM option. #SBATCH --job-name=slurm_job # Slurm job name #SBATCH --time=12:00:00 # Set the maximum runtime #SBATCH --partition=<partition_to_use> # Choose partition #SBATCH --account=<account_name> # Specify project account # Resource allocation #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-gpu=16 #SBATCH --gpus-per-node=2 # Uncomment to enable email notificaitons # Remember to update your email address ##SBATCH --mail-user=user_name@ust.hk ##SBATCH --mail-type=BEGIN,END,FAIL,REQUEUE # Feel free to remove any # Setup runtime environment if necessary # For example, module load openmpi # Go to the job submission directory and run your application cd $HOME/apps/myapp # Execute applications in parallel srun --ntasks=2 --gpus-per-task=1 --cpus-per-gpu=16 myapp1 & # Assign 2 CPU cores and 1 GPU device to run application "myapp1" srun --ntasks=2 --gpus-per-task=1 --cpus-per-gpu=16 myapp2 # Similarly, assign 2 CPU cores and 1 GPU device to run application "myapp2" wait
Example 2: create a slurm script for a GPU application.
#!/bin/bash # NOTE: Lines starting with "#SBATCH" are valid SLURM options. # Lines starting with "#" and "##SBATCH" are comments. # Uncomment a "##SBATCH" line (i.e. remove one #) to #SBATCH # means turn a comment to a SLURM option. #SBATCH --job-name=slurm_job # Slurm job name #SBATCH --time=12:00:00 # Set the maximum runtime #SBATCH --partition=<partition_to_use> # Choose partition #SBATCH --account=<account_name> # Specify project account # Resource allocation #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-gpu=16 #SBATCH --gpus-per-node=4 # Uncomment to enable email notificaitons # Remember to update your email address ##SBATCH --mail-user=user_name@ust.hk ##SBATCH --mail-type=BEGIN,END,FAIL,REQUEUE # Feel free to remove any # Setup runtime environment if necessary # For example, module load openmpi # Go to the job submission directory and run your application cd $HOME/apps/slurm ./your_gpu_application
Interactive job
The basic procedure is:
- Log in to a HPC machine
- Request compute resources using
- srun (run at current terminal after resources being allocated), or
- salloc (manually ssh into machine after resources being allocated).
- For example:
$ srun --partition=gpu-a30 --gpus-per-node=4 --account=<projectgroupname> --pty bash
All #SBATCH options are vaild srun/salloc options. --pty bash means obtain a interactive shell. You may also execute the command directly:$ srun --partition=gpu-a30 --gpus-per-node=4 --account=<projectgroupname> ./myapp
- Start your program
[user@gpu01 ~]$ python Python 3.12.4 (main, Jun 24 2024, 22:04:18) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> 1+1 2 >>>
Check GPU usage for the job
Use this command:
srun --jobid=<jobid> -w <nodelist> --overlap --pty bash -i
replace the jobid and nodelist with the job you want to check.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 548 gpu bash kcalexla R 5:13 1 gpu01
Tue Jun 25 09:57:31 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA L20 On | 00000000:17:00.0 Off | 0 | | N/A 28C P8 23W / 350W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 0 0 814970 C 6400MiB| +---------------------------------------------------------------------------------------+
References:
- https://wiki.rc.usf.edu/index.php/SLURM_Interactive
- https://github.com/SouthernMethodistUniversity/hpc_docs/blob/main/docs/access.md