HPC and GPU computing

MPI configuration

In parallel computing, the Process Management Interface (PMI) allows the MPI process to interact with the process manager by adding information to the database (“put” operations) and querying information added by other processes in the application (“get” operations).

PMIs are available in our SLURM installation. To list the available PMIs, run the following command on an interactive server:

% srun --mpi=list

The correlation between the PMI and the required MPI type is shown in the SLURM MPI Users Guide.

To implement MPI in your parallel calculations, please add one of the following syntaxes to the job.sh script you will submit:

Using the software installed on our system:

module load openmpi
mpirun -np 4 <path>/<script openmpi>

Using the PMI installed in SLURM:

srun --mpi=pmix -n 4 <path>/<script openmpi>

Using the software installed on our system:

module load mpich
mpirun -np 4 <path>/<script mpich>

Using the PMI installed in SLURM:

srun --mpi=pmi2 -n 4 <path>/<script mpich>

Using CUDA

CC-IN2P3 provides a complete Nvidia environment (drivers, CUDA, CUDnn and NCCL libraries), and upgrade it on a regular basis. To know the current CUDA version please use the following command from an interactive GPU server:

% nvidia-smi

As CUDA librairies are getting bigger and bigger, it is not possible to keep all cuda versions on gpu servers. For this reason, only one version is kept to ensure a version of the CUDA libraries consistent with the drivers.

If you need to use a specific version of CUDA you have the following options:

You may install your own version on your group’s THRONG directory. Then in your job, you just have to specify the path to your CUDA installation. For example (no need to be root):

% wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
% sh cuda_12.4.0_550.54.14_linux.run --toolkit --installpath=\$THRONG_DIR/cuda-12.4/

Then choose to install only CUDA (uncheck drivers). The environment variables to be defined in the job are:

export PATH=$HOME/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=$HOME/cuda-12.4/lib64:$LD_LIBRARY_PATH

To compile your CUDA code, connect to an interactive GPU server and then use the nvcc compiler:

% /usr/local/cuda-12/bin/nvcc

Once the code is compiled, we recommend you to exit the interactive server and submit your jobs with sbatch.

Note

To profile GPU jobs, CUPTI (CUDA Profiling Tools Interface) is installed on our computing nodes. Since CUPTI is directly linked to CUDA the installed version is the same.