HPC2 Cluster FAQ

General

1. Am I eligible to apply account to access the HPC2 cluster for research computing?

The cluster is open to all approved university researchers and the application must be sponsored by a principal investigator (PI), a faculty of the university. The PI needs to apply for an account too if he/she would like to access the cluster. Please refer to HPC2 cluster website on what resources are available and how to apply for the account.

2. How do I log into the HPC2 cluster?

You can access the hpc2.ust.hk, the login-node of the cluster with Secure Shell (SSH) on campus. On PC Windows platform, you can use the free SSH client like PuTTY. You need to use campus network or Wi-Fi (SSID: eduroam) to access the login node. In case you are from off-campus, you can access the login-node via VPN (Virtual Private Network).

3. How do I transfer files to or from the cluster?

You can use SFTP (SSH File Transfer Protocol) client such as FileZilla on PC Windows to do the file transfer.

4. How many jobs and nodes I can run and use in the cluster?

Compute node has processors, memory and local disk as resources. The resources limit in the cluster is based on CPU core only for allocation and the usage quota is group based to be shared among the members of a group (PI). For your usage quota, please refer to the Cluster Resource Limits for details depending on which School/Private PI group you are with.

5. What is my disk quota?

The default disk quota for each SSCI and SENG PI group is 2 TB and for others is 500 GB and it is shared among the members of a group.
To check your group disk usage, use command lfs quota –gh /home
To check your disk usage, use command lfs quota –uh /home

6. How do I run jobs in the cluster?

You can compile and develop your application in hpc2.ust.hk. You have to submit jobs to SLURM, the resource management and job scheduling system in the cluster to run. Details can found in the Job Execution page.

Slurm

7. How do I check my submitted job status?

You can use the command squeue –u $USER to check your job status. The output field of ST (job state) shows the job status. The typical states are R (running), PD (Pending), CG (Completing).

8. What is meaning of the quota limit of GrpJobs, GrpNodes, GrpsubmitJobs and partition WallTime in Cluster Resource Limits webpage?

The GrpJobs is the total number of jobs able to run at any given time for a PI group in a partition. The GrpNodes is the total number of nodes able to be used at any given time for a PI group in a partition. The GrpsubmitJobs is total number of jobs able to be submitted, running and waiting, to the system at any given time for a PI group. The maximum WallTime is the maximum run time limit for jobs in the partition.

9. Why my job is pending while there is idle node?

A possible scenario is that one or more higher priority jobs exist in the partition. For example, there is only one idle node while a job asking for 2 nodes is submitted, so the job is pending. Later another job asking for 1 node is submitted but it will be pending as there exists a pending higher priority job (earlier submitted jobs asking 2 nodes). The priority is related to the submission time.

Software

10. Can I install software in the cluster?

In general, you can install software in your own home directory or in the group share directory. Please note that you are responsible for the licenses and copyright of the software you install in the cluster. You should also adhere to ITSC’s Acceptable Use Policy.

11. The system gcc/g++ version is 4.4.7 and my application needs 4.8.5 or above, it there a newer version of gcc/g++?

You can switch to use gcc v4.9.2. For details, please refer to section "GCC (GNU Compiler Collection) v4.9.2" in the Software page

12. The system Python version is 2.6.6, is there a newer version of Python?

You can use the Anaconda Python 2.7 or 3.6 . For details, please refer to sections "Anaconda v5.2 with Python 2.7" or "Anaconda v5.2 with Python 3.6" in the Software page.

13. My application needs GPU support and there is no GPU coprocessor in hpc2.ust.hk. How can I do the compilation and testing?

For GPU application development with CUDA and gpu device, you can go to hnode-77 from login node via SSH, e.g, use command ssh hnode-77. If you find the access to hnode-77 is denied, please email to hpcadmin@ust.hk to apply for direct access to hnode-77.

14. Can I install tensorflow or tensorflow-gpu in my home directory or the group share directory?

Yes, however you cannot install tensorflow directly as the OS of HPC2 cluster is CentOS 6.x which doesn't support tensorflow natively. Instead, we recommend 2 methods about installing tensorflow:

  • Method 1: Use Anaconda
    e.g, after you have created the virtual environment and activate it, to install tensorflow-gpu in the environment:
    conda install -c anaconda tensorflow-gpu
  • Method 2: Use Singularity
15. My application is not compatible with the operating system (CentOS 6.6) of the cluster and it needs a newer version of glibc equivalent to the CentOS 7.x, can I set it up to run in the cluster?

Yes, it is possible to run the application using Singularity container. For example, pytorch (version > 0.4.1) requires GLIBC 2.14 which is not officially supported by CentOS 6.6. You need to use Singularity to create a container with CUDA and pytorch, then run it in HPC2 cluster. The general guide can be found here. Presume you have a linux-based PC or VM, and have Singularity installed, follow these steps: 

1. Create a definition file in your PC with root or sudo privilege:

vi pytorch-cuda.txt

Sample file can be found here.

2. Build the container:

sudo singularity build --writable pytorch-cuda.img pytorch-cuda.txt

3. Upload the container image (pytorch-cuda.img) to HPC2.

Then you can test it in hnode-77 of HPC2:

ssh hnode-77
source /usr/local/setup/singularity.sh
singularity exec --nv pytorch-cuda.img python3