The Apptainer container (compatible with Singularity in image format and command line level) provides users with the capability to execute applications within their preferred Linux environment. By encapsulating both the operating system and application stack into a container image file, it enables seamless modification, replication, and transferability across systems where Apptainer (or Singularity) is installed. This image file can be utilized as a user application, leveraging the native resources of the host system, such as high speed network, GPU/accelerators, and resource manager integration. Apptainer, in essence, facilitates the concept of Bring-Your-Own-Environment (BYOE) computing within a multi-tenant and shared High-Performance Computing (HPC) cluster.
Users can use apptainer and singularity in an interchangeable way in all below examples.
Workflow
The above figure from official documentation describes the typical workflow to use Apptainer. In general, there are 2 stages in the workflow: build container in user endpoint and execute container in production environment. Typically, user endpoint refers to local systems where you have admin/root privilege, e.g, your desktop/virtual machine, while production environment refers to shared environment where you only have user privilege, e.g, Any HPC cluster.
In the first stage, you need to build a customized container by installing applications and modifying configuration if applicable. With the prepared container image, you can upload it to your home directory in the cluster to start stage 2. In this stage, you can treat the container as an application. Similar to other user application, you can submit job to execute the container in compute node via SLURM.
The apptainer is made available in HPC4 by default. No separate module load is required. An example of usage is shown.
Download the container and convert into the Singularity Image Format (SIF) if not yet done.
% apptainer pull tensorflow:23.11-tf2-py3.sif docker://nvcr.io/nvidia/tensorflow:23.11-tf2-py3
Suppose you have downloaded the TensorFlow models to your home directory in HPC4 as below.
% git clone https://github.com/tensorflow/models.git
Obtain an interactive session with GPU device with srun and test the container to train a model on the MNIST handwritten digit data set.
gpu01% apptainer exec --nv tensorflow:23.11-tf2-py3.sif python ./models/tutorials/image/mnist/convolutional.py
The --nv
option allows the container to access the GPU devices in the node with the Nvidia driver installed.
To schedule a job with SLURM, you can put the above statement in your SLURM job submission script.
Important Note:
Test your container before job submission. You can test it in interactive mode or batch mode. It is important to test if the container works as you expect, especially when your application utilizes GPU
Reference Website:
Apptainer Quick Start - https://apptainer.org/docs/user/main/quick_start.html