Updates on HKUST SuperPOD Relocation Back to Campus & Upcoming Briefing Sessions

Date: 24 Oct 2024

Dear Faculty Members and Researchers,

We would like to provide you with an update on the captioned relocation exercise. All the SuperPOD equipment has now been physically moved back to campus. Right now, the contractor is at the final stage of cable installation, reconstructing the whole network connectivity which includes both the compute and storage fabrics. This will be followed by a final testing stage to validate the whole configuration.

In view of the complexity, we originally planned to resume the HKUST SuperPOD service gradually in 3 stages, starting from end of October, and with full capacity to be delivered on Nov 11 (Mon). Upon further discussion with the contractor, and with their help to work overtime to expedite the process, now we are aiming at resuming the service in full capacity in one go on Nov 1 (Fri). We understand the importance of HKUST SuperPOD to your research, and are doing everything possible to complete the relocation. We really appreciate your patience and understanding during this time.

Please also note that two briefing sessions, with details provided below, will be held for HKUST SuperPOD users. These sessions are available both in-class and through Zoom, with recordings provided for future reference. To complete your registration, kindly click on the links provided below.

 

1. Briefing Session: Introduction to HKUST SuperPOD

An introductory session aiming to help new users familiarize themselves with HKUST SuperPOD.

Agenda

  • What is HKUST SuperPOD?
  • Partitions, Queues & Job QoS
  • Resource Allocation and Consumption
  • Demos
  • Q&A

Date: 7 Nov 2024 (Thu)
Time: 10:00am to 11:00noon
Venue: Classroom 2405 (Lift 17-18) & Zoom

Registration: Classroom 2405 & Online Zoom

 

2. Advanced Session: Scaling ML Workflows for Large-Scale Training on HKUST SuperPOD

Advanced topics focusing on scaling ML pipelines to fully utilize NVIDIA DGX H800 SuperPOD GPU resources for LLM training

Agenda

  • Efficient dataset transfer into SuperPOD
  • Parallel file operations
  • Scaling out data pre-processing
  • Design your ML model to better leverage SuperPOD
  • Model debugging on SuperPOD
  • Monitor your production model training
  • Q&A

Date: 19 Nov 2024 (Tue)
Time: 2:30pm-3:30pm
Venue: Classroom 2465 (Lift 25-26) & Zoom

Registration: Classroom 2465 & Online Zoom

 

For any inquiries, please do not hesitate to reach out to us at spodsupport@ust.hk.

 

Regards,
Kenneth Cheng

Service Manager (Research Computing)
ITSC