The system supports 2 types of user accounts: namely project-based and individual students with approved UROP
- Project-based accounts
- Allow to access more computational resources, with allocation granted during project approval
- Computational resources are shared among all group members of the project
- Usage accounting for computational resource is implemented, details to be announced later
- Provide shared storage space for the group
- Individual student accounts
- Computational resources are allocated to each student individually
- No usage accounting
Resource Request
- Resource request is counted by GPU Resource Unit (GRU). Each GRU associated with different maximum CPU cores and system memory in slurm partitions.
- For the project & large-project partitions, 1 GRU corresponds to
- One H800 GPU with 80GB GPU memory
- 14 CPU cores with 28 Threads
- 224GB system memory
- For the student partition, each H800 GPU is partitioned into different size of GPU instances using Nvidia MIG technology, with 1 GRU corresponds to either 3g.40gb, 4g.40gb or 7g.80gb MIG device
- For 3g.40gb, 1 GRU is
- 3/7 of one H800 GPU system computational power with 40GB GPU memory
- For 4g.40gb, 1 GRU is
- 4/7 of one H800 GPU system computational power with 40GB GPU memory
- For 7g.40gb, 1 GRU is
- equivalent to whole H800 GPU of computational power and memory
- 8 CPU cores with 16 Threads
- 160GB system memory
- For 3g.40gb, 1 GRU is
-
For the debug partition, 1 GRU corresponds to
-
One H800 GPU with 80GB GPU memory
-
14 CPU cores with 28 Threads
-
224GB system memory
-
Partition Table
Slurm Partition | project & large-project | student | debug | cpu |
---|---|---|---|---|
No. of DGX nodes |
52 |
2 with GPU MIG partitioned |
1 |
2 CPU nodes |
Who can access |
Project based users only |
Non-project based student users only |
All |
Project based users only |
Purpose |
Computation |
Computation |
Compile, build container, interactive debug, code profiling |
Data pre-processing for GPU computation |
Max Wall Time |
3 days |
1 day |
2 hours |
12 hours |
Max resource requested |
Varies with projects, |
1 GRU |
1 GRU |
8 CPU cores (per job) |
Concurrent running jobs quota per user |
8 |
1 |
1 |
28 |
Queuing and running jobs limit per user |
10 |
2 |
1 |
28 |
Usage Accounting |
Yes |
No |
No |
No |
Job Preemption |
In large-project partition, jobs from approved projects can preempt other jobs that can run for at least 2 hours before getting preempted |
No |
No |
No |
Remarks |
Resources quota are per-project unless specified |
Resources quota are per-user instead of per project |
Resources quota are per-user instead of per project |
Resources quota are per-project unless specified No access to the /scratch directory |