GPU jobs

CC-IN2P3 provides a CentOS 7-based GPU computing platform that consists of 2 types of GPUs:
  • K80 with 12 GB DDR5 dedicated. InfiniBand connection between nodes
  • V100 with 32 GB HBM2 dedicated. No InfiniBand connection

For any other detail, please refer to the technical characteristics of the computing servers linked in the computing platform page.

The user chooses the GPU type at submission time:

-l GPUtype=K80 or -l GPUtype=V100

The queues available for GPU jobs are the following:

interactive GPU jobs :
mc_gpu_interactive
multi-core GPU jobs :
mc_gpu_medium mc_gpu_long mc_gpu_longlasting
parallel GPU Jobs :
pa_gpu_long

To know the queues limits please refer to the page Information on scheduling queues.

Attention

With the exception of mc_gpu_interactive, the access to all GPU queues is restricted. You must contact your computing czar to access this type of resource. See Restricted queues FAQ.

Jobs requesting access to GPU compute servers must use them as efficiently as possible. In fact, GPUs monopolized by an inefficient job are wasted. To check if your job uses GPUs:

  • nvidia-smi allows you to visualize the efficiency of your interactive jobs in real time;
  • the end of the job cartridge (see the output file) gives you metrics about the job efficiency.

Several CUDA environments are available, and are updated on a regular basis (the current version of the CUDA drivers is 465.19.01). The following environments are currently available:

  • 10.1
  • 10.2
  • 11.3

These libraries are installed in the following directory trees:

  • /usr/local/cuda-10.1/
  • /usr/local/cuda-10.2/
  • /usr/local/cuda-11.3/

To compile in CUDA, please refer to the dedicated paragraph. In order to use software unavailable on the computing platform, or to use an earlier version, CC-IN2P3 offers the Singularity virtualization solution.

If you need to use CUDA or OpenCL libraries, specify in your script:

  • bash

    if! echo $ {LD_LIBRARY_PATH} | /bin/grep -q /opt/cuda-10.1/lib64 ; then
           LD_LIBRARY_PATH=/opt/cuda-10.1/lib64:${LD_LIBRARY_PATH}
    fi
    
  • csh

    if ($?LD_LIBRARY_PATH) then
           setenv LD_LIBRARY_PATH /opt/cuda-10.1/lib64:${LD_LIBRARY_PATH}
    else
           setenv LD_LIBRARY_PATH /opt/cuda-10.1/lib64
    endif
    

Interactive GPU jobs

Interactive GPU jobs are started with the qlogin command (read also the page Interactive jobs) and by choosing the mc_gpu_interactive queue, for example:

% qlogin -l GPU=<number_of_gpus> -l GPUtype=<gpu_type> -q mc_gpu_interactive -pe multicores_gpu 4

Example:

% qlogin -l GPU=1 -l GPUtype=V100 -q mc_gpu_interactive -pe multicores_gpu 4

Interactive job submissions must request exactly 4 CPUs (-pe multicores_gpu 4) to be executable.

Multi-core GPU jobs

To submit a GPU job, you must specify the GPU queue (for example -q mc_gpu_long), the number of GPUs required (for example -l GPU=2, up to 4 GPUs are available per server) and the dedicated multi-core environment (-pe multicores_gpu). In summary, the qsub options are:

% qsub -l GPU=<number_of_gpus> -l GPUtype=<gpu_type> -q <QueueName> -pe multicores_gpu 4 ...

The CUDA_VISIBLE_DEVICES variable is set automatically.

Example of submission:

% qsub -l GPU=2 -l GPUtype=K80 -q mc_gpu_long -pe multicores_gpu 4 ...

Multicore job submissions must request exactly 4 CPUs (-pe multicores_gpu 4) to be executable. However, this request does not constrain the job, so it can actually use more or less than 4 CPUs, depending on its needs; please only avoid occupying all the CPUs of a server (16) if you do not use all the GPUs.

Parallel GPU Jobs

To submit a parallel GPU job, you must specify:

  • the queue -q pa_gpu_long
  • the number of GPUs desired per server -l GPU=x, with 1 ≤ x ≤ 4
  • the openmpigpu_4 environment which will be used to determine the number of calculation server wanted -pe openmpigpu_4 x, with x = 4 times the number of servers you want to use
  • the type of GPU available for parallel jobs is only K80: -l GPUtype=K80

Your script must contain some OpenMPI-specific directives (including the launch of MPIEXEC), which are specified in the section: Parallel jobs.

The options are:

% qsub -l GPU=<number_of_gpus_per_node> -l GPUtype=K80 -q pa_gpu_long -pe openmpigpu_4 <number_of_servers_times_4> ...

Example:

% qsub -l GPU=3 -l GPUtype=K80 -q pa_gpu_long -pe openmpigpu_4 8 my_script_for_2_nodes_and_6_GPU.sh

Compile in CUDA

To compile your CUDA code, you should connect to the interactive GPU server with the following command line example:

qlogin -l GPU=1 -l GPUtype=K80 -q mc_gpu_interactive -pe multicores_gpu 4

Then you will be connected by SSH to the server, and you will be able to compile your code with the nvcc compiler:

% /opt/cuda-10.1/bin/nvcc

Once the code is compiled, we recommend you to exit the interactive server and submit your jobs with qsub.