System Overview and hardware configuration

The X-GPU cluster is set up as an in-house high performance computing (HPC) facility at HKUST in Oct 2020.  It has 38 GPU nodes in which 30 nodes having 10*RTX2080Ti and 8 nodes having 10*RTX6000. They are all connected with infiniBand (IB) at 100 Gbit/s and 288TB raw disk storage. The total number of GPU cards is 380 respectively and the theoretical peak performance is 5339TFlops for GPU (FP32)

Details

The X-GPU cluster consists of the following equipments

Master & login node: The master node has been setup with OpenHPC cluster management system to handle the cluster management, job scheduling and monitoring. The login node is the entry point for user to login to compile and submit their job

GPU node: 30 GPU nodes with two 10-cores Intel Xeon Silver 4210 CPU processors, 256GB physical memory, 10 Nvidia GeForce RTX-2080Ti GPU cards, 8 GPU nodes with two 10-cores Intel Xeon Silver 4210 CPU processors, 256GB physical memory, 10 Nvidia Quadro RTX-6000 GPU cards

File Systems: i) 288TB raw storage 

Interconnect: All servers are interconnected with Mallanox EDR InfinBand in a 1:2 fat-tree topology.