HPC4 Cluster

The HPC4 is the fourth-generation high-performance computing cluster implemented and maintained by ITSC.  It is officially rolled out in mid-Oct 2024 after a pilot testing period.  Hosted in the new High Performance Infrastructural Center (HPCIC), the HPC4 equipment is primarily funded by the University, while also accepting contributions from faculty members. The HPC4 platform is designed to support scientific computations with Intel-based and AMD-based CPU nodes, and it also features Nvidia-based GPU machines for tasks that do not require hardware specific for AI centric workloads like HKUST SuperPOD.

Quick Links

        

What's New

  • 4  Nov 2024 - Introduction to HKUST HPC4

        (Video Recording -Presentation :Slides

         Demo : Login, Slurm, Interactive mode, Spack)

 

HPC4 Highlights

The HPCIC adopts Liquid Immersion Cooling Technology, which offers the following advantages:

• Allows higher density of computing resources resulting in more hardware in less physical space.

• Offers high energy efficiency for sustainability

• Operating cost reduction from energy saving in cooling of machines

 

 

The system environment of HPC4 is based on the latest version of Rocky Linux 9.  Secure computing approach would be adopted, with regular operating system upgrades and security patches to be applied to maintain a safe environment for research workloads.

The software environment promotes a Do-It-Yourself installation approach using the Spack tool (https://spack.readthedocs.io/), providing users with the flexibility to customize their software environment to meet their research needs. Additionally, the use of Apptainer (formerly known as Singularity) is supported and encouraged to handle different software packaging and compatibility issues.

 

HPC4 Hardware Specification

HPC4 cluster is composed of both CPU and GPU nodes:

CPU Nodes
Processor No. of Nodes CPU Cores
per Node
Memory
per Node
Max
Instruction
Set
1.9 GHz Intel Emerald Rapids 10 128 512 GB AVX-512
2.25 GHz AMD Bergamo 76 256 768 GB AVX-512
2.25 GHz AMD Bergamo 16* 256 1.5 TB AVX-512

* Contributed servers to be available soon

GPU Nodes
Processor No. of Nodes CPU Cores
per Node
Memory
per Node
Max
Instruction
Set
GPUs
per Node
2.1 GHz Intel Sapphire Rapids 15 64 512 GB AVX-512 4
(A30, 24 GB)
2.5 GHz Intel Emerald Rapids  6 64 512 GB AVX-512 4
(L20, 48 GB)

 

Performance Comparison of HPC4 CPUs and GPUs

To give user an idea of the performance of CPUs and GPUs in HPC4 with others, please check here

 

HPC4 Software

Modules

Lmod is used to manage installations for most application software. With the modules system, user can set up the shell environment to give access to applications and make running and compiling software easier. It also allows us to run multiple versions of the same software that co-exist in the system with abstraction of version and high dependencies of the OS.

Click here for details of the module system.

Spack - User-managed software installation manager

Spack is a package manager that enables users to build software from source code or to install pre-compiled binary packages directly into their computing environments. With Spack, users can easily manage the installation, configuration, and dependencies of a wide variety of scientific and high-performance computing software packages.

Click here for details of Spack.

Use of Apptainer (Singularity)

Apptainer (formerly known as Singularity) container lets user run applications in a Linux environment of their choice. It encapsulates the operating system and the application stack into a single image file. One can modify, copy and transfer this file to any system has Apptainer installed and run as a user application by integrating the system native resources such as infiniband network, GPU/accelerators, and resource manager with the container. Apptainer literally enables BYOE (Bring-Your-Own-Environment) computing in the multi-tenant and shared HPC cluster.

Click here to view details of Apptainer (Singularity)

 

HPC4 Charging Model

Charging for HPC4 services can be justified for several reasons. It ensures efficient resource allocation by prioritizing projects with significant needs and potential impacts, while also helping to recover the substantial costs associated with hardware, maintenance, and energy consumption. This approach supports the sustainability of the service by funding ongoing maintenance and necessary upgrades to keep the infrastructure current and reliable. Additionally, charging encourages accountability and fair usage among researchers and promotes optimization. It also incentivizes researchers to seek external funding, aligning their projects with available grants and enhancing the overall research output of HKUST.

Click here for details on the charging for use of HPC4.

 

HPC4 Contribution Model

Adopting the community cluster model for HPC4 cluster offers several benefits. It enables resource pooling and sharing which reduce costs for individual research team access to advanced computational resources that might otherwise be more expensive or unattainable.

Subject to HPCIC resource availability, the HPC4 adopts community cluster model similar to HPC3 and accepts hardware contribution from faculty members on regular basis.  Details of the HPC4 contribution model are available here.

 

HPC4 Account Application

All HKUST faculty members are eligible to apply for a HPC4 account.  To apply, please complete the HPC4 Account Application Form.  Students who wish to utilize HPC4 should consult their supervisors to support their applications by completing the above application form.

 

Getting Started

How to login to the cluster

Click here to view the instructions on how to get access to the HKUST HPC4 cluster

Use of SLURM Job Scheduling System

The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM.

Click here to learn how to submit your first SLURM job

Click here to view details of using SLURM

Partition and Resource Quota

Click here to view more information on partition and resource quota.

Storage Types

Click here to view more information on different storage types.

Job Priority and Accounting

Not enforced in the pilot stage. Further information will be available in due course.
 

Usage Tips

Specific Use Cases