GPU jobs

CC-IN2P3 provides a CentOS 7-based GPU computing platform that consists of 2 types of GPUs:
  • 10 Dell C4130 with 4 GPUs and 16 CPU cores per compute server
    • 2 Xeon E5-2640v3 (8c @ 2.6 Ghz)
    • 128 GB RAM
    • 2 Nvidia Tesla K80 → 4 GPU Nvidia GK210 with 12 GB DDR5
    • InfiniBand between the nodes
  • 6 Dell C4140 with 4 GPUs and 20 CPU cores per compute server
    • 2 Xeon Silver 4114 (10c @ 2.2 GHz)
    • 192 GB RAM
    • 4 NVidia Tesla V100 PCIe → 4 Nvidia GPUs with 32 GB HBM2
    • No InfiniBand

The user chooses the GPU type at submission time:

-l GPUtype=K80 or -l GPUtype=V100

The queues available for GPU jobs are the following:

interactive GPU jobs :
mc_gpu_interactive
multi-core GPU jobs :
mc_gpu_medium mc_gpu_long mc_gpu_longlasting
parallel GPU Jobs :
pa_gpu_long

To know the queues limits please refer to the page Information on scheduling queues.

Attention

With the exception of mc_gpu_interactive, the access to all GPU queues is restricted. You must contact your czar to access this type of resource. See Restricted queues FAQ.

Jobs requesting access to GPU compute servers must use them as efficiently as possible. In fact, GPUs monopolized by an inefficient job are wasted. To check if your job uses GPUs:

  • nvidia-smi allows you to visualize the efficiency of your interactive jobs in real time;
  • the end of the job cartridge (see the output file) gives you metrics about the job efficiency.

The CUDA 10.1 and OpenCL 1.2 environments are available in /opt/cuda-10.1. To compile in CUDA, please refer to the dedicated paragraph.

CUDA is updated regularly, but the n-1 version is still kept to meet specific needs. Currently, CUDA 9.2 is available in /opt/cuda-9.2. To take advantage of software unavailable on the computing platform, or to use an earlier version, CC-IN2P3 offers the Singularity virtualization solution.

If you need to use CUDA or OpenCL libraries, specify in your script:

  • bash

    if! echo $ {LD_LIBRARY_PATH} | /bin/grep -q /opt/cuda-10.1/lib64 ; then
           LD_LIBRARY_PATH=/opt/cuda-10.1/lib64:${LD_LIBRARY_PATH}
    fi
    
  • csh

    if ($?LD_LIBRARY_PATH) then
           setenv LD_LIBRARY_PATH /opt/cuda-10.1/lib64:${LD_LIBRARY_PATH}
    else
           setenv LD_LIBRARY_PATH /opt/cuda-10.1/lib64
    endif
    

Interactive GPU jobs

Interactive GPU jobs are started with the qlogin command (read also the page Interactive jobs) and by choosing the mc_gpu_interactive queue, for example:

% qlogin -l GPU=<number_of_gpus> -l GPUtype=<gpu_type> -q mc_gpu_interactive -pe multicores_gpu 4

Example:

% qlogin -l GPU=1 -l GPUtype=V100 -q mc_gpu_interactive -pe multicores_gpu 4

Interactive job submissions must request exactly 4 CPUs (-pe multicores_gpu 4) to be executable.

Multi-core GPU jobs

To submit a GPU job, you must specify the GPU queue (for example -q mc_gpu_long), the number of GPUs required (for example -l GPU=2, up to 4 GPUs are available per server) and the dedicated multi-core environment (-pe multicores_gpu). In summary, the qsub options are:

% qsub -l GPU=<number_of_gpus> -l GPUtype=<gpu_type> -q <QueueName> -pe multicores_gpu 4 ...

The CUDA_VISIBLE_DEVICES variable is set automatically.

Example of submission:

% qsub -l GPU=2 -l GPUtype=K80 -q mc_gpu_long -pe multicores_gpu 4 ...

Multicore job submissions must request exactly 4 CPUs (-pe multicores_gpu 4) to be executable. However, this request does not constrain the job, so it can actually use more or less than 4 CPUs, depending on its needs; please only avoid occupying all the CPUs of a server (16) if you do not use all the GPUs.

Parallel GPU Jobs

To submit a parallel GPU job, you must specify:

  • the queue -q pa_gpu_long
  • the number of GPUs desired per server -l GPU=x, with 1 ≤ x ≤ 4
  • the openmpigpu_4 environment which will be used to determine the number of calculation server wanted -pe openmpigpu_4 x, with x = 4 times the number of servers you want to use
  • the type of GPU available for parallel jobs is only K80: -l GPUtype=K80

Your script must contain some OpenMPI-specific directives (including the launch of MPIEXEC), which are specified in the section: Parallel jobs.

The options are:

% qsub -l GPU=<number_of_gpus_per_node> -l GPUtype=K80 -q pa_gpu_long -pe openmpigpu_4 <number_of_servers_times_4> ...

Example:

% qsub -l GPU=3 -l GPUtype=K80 -q pa_gpu_long -pe openmpigpu_4 8 my_script_for_2_nodes_and_6_GPU.sh

Compile in CUDA

To compile your CUDA code, you should connect to the interactive GPU server with the following command line example:

qlogin -l GPU=1 -l GPUtype=K80 -q mc_gpu_interactive -pe multicores_gpu 4

Then you will be connected by SSH to the server, and you will be able to compile your code with the nvcc compiler:

% /opt/cuda-10.1/bin/nvcc

Once the code is compiled, we recommend you to exit the interactive server and submit your jobs with qsub.