HKUST SuperPOD - A TensorFlow Example

Example TensorFlow:

1. In the first terminal

Run an interactive job in gpu node using srun. Supply your project group name and the partition (e.g. normal ) going to use.
```
srun --partition normal --account=<yourgroupname> --gres=gpu:2 --pty $SHELL
netid@dgx-26:~$
```
Note the DGX node name dgx-26 in the above example.

2. (Skip if not using container)Create Tensorflow image if it is not available.

apptainer pull tensorflow:23.11-tf2-py3.sif docker://nvcr.io/nvidia/tensorflow:23.11-tf2-py3

3. (Skip if not using container)Run tf image and mount your directory preferred to a mount point in container. In this example, we map our own scratch space to /scratch in container( /scratch does not need to already exist in the container)

apptainer run -B /scratch/yournetid:/scratch  --nv tensorflow:23.11-tf2-py3.sif

4. Type: jupyter-lab --allow-root --ip='0.0.0.0'

5. Mark the token for the second terminal

6. Open another terminal to do second login. Do port mapping between compute node and your host, replace -xx with number. For our case should be dgx-26

ssh yournetid@superpod -L 8888:dgx-xx:8888

7. Open the browser in your local workstation and type “http://127.0.0.1:8888/?token=????

8. Done.

ITSC Chatbot

close