HPC and GPU computing
MPI configuration
In parallel computing, the Process Management Interface (PMI) allows the MPI process to interact with the process manager by adding information to the database (“put” operations) and querying information added by other processes in the application (“get” operations).
PMIs are available in our SLURM installation. To list the available PMIs, run the following command on an interactive server:
% srun --mpi=list
The correlation between the PMI and the required MPI type is shown in the SLURM MPI Users Guide.
To implement MPI in your parallel calculations, please add one of the following syntaxes to the job.sh script you will submit:
Using CUDA
CC-IN2P3 provides a complete Nvidia environment (drivers, CUDA, CUDnn and NCCL libraries), and upgrade it on a regular basis. To know the current CUDA version please use the following command from an interactive GPU server:
% nvidia-smi
As CUDA librairies are getting bigger and bigger, it is not possible to keep all cuda versions on gpu servers. For this reason, only one version is kept to ensure a version of the CUDA libraries consistent with the drivers.
If you need to use a specific version of CUDA you have the following options:
You may install your own version on your group’s THRONG directory. Then in your job, you just have to specify the path to your CUDA installation. For example (no need to be root):
% wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
% sh cuda_12.4.0_550.54.14_linux.run --toolkit --installpath=\$THRONG_DIR/cuda-12.4/
Then choose to install only CUDA (uncheck drivers). The environment variables to be defined in the job are:
export PATH=$HOME/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=$HOME/cuda-12.4/lib64:$LD_LIBRARY_PATH
You may use an Apptainer image with a builtin CUDA installation.
Be aware that you have to build the image first and have root permissions to do it (on your workstation for instance), then copying the image on your storage areas and then using it in a job.
If you’re using python, you can also install CUDA via python packages, or even choose a CUDA version when using frameworks such as pytorch, jax, etc…
Attention
Be aware that in any case you’ll have to follow the drivers/CUDA matrix, i.e old versions of CUDA are no more supported!
Please check the NVIDIA Support Matrix and contact our user support to gather all the necessary information.
To compile your CUDA code, connect to an interactive GPU server and then use the nvcc compiler:
% /usr/local/cuda-12/bin/nvcc
Once the code is compiled, we recommend you to exit the interactive server and submit your jobs with sbatch.
Note
To profile GPU jobs, CUPTI (CUDA Profiling Tools Interface) is installed on our computing nodes. Since CUPTI is directly linked to CUDA the installed version is the same.