Job Allocation Tips

Submitting jobs to partitions with checkpoints

You are advised to enable checkpointing in your code.  Refer to the following webpages for details.

 

Using "sbatch" for jobs with long execution time

For submitting jobs with long execution time, batch SLURM script submission (sbatch) shall be used.

The default wall time is 8 hours.

Maximum wall time is 3 days.

If you want to run job longer than 8 hours, please specify it in your SLURM script such as:

  #SBATCH --time=72:00:00
  Or
  #SBATCH --time=3-00:00:00

 

Interactive jobs with maximum wall time of 2 hours for large and normal partition

For interactive jobs using srun and salloc SLURM commands, it will reset to a wall time of 2 hours by the SLURM scheduler.  For jobs that needs to run within a longer wall time, please use sbatch instead.  Below is a sample of an interactive job using srun in requesting 2 GPUs.

   srun --partition normal --account=xxx --nodes=1 --gpus-per-node=2 --pty bash

No interactive jobs are allowed under cpu partition.

Utilizing the right partition for your jobs

  • CPU partition is recommended for tasks such as data processing or data visualization which requires CPU instead of GPU.  CPU machines are connected to the HKUST SuperPOD cluster through the partition name “cpu”. However, CPU nodes could not access scratch space.  To run your jobs in the CPU partition, specify the parameter "--partition cpu" in your command.