GPU jobs

GPU jobs have restricted usage: you should contact your czar if you want to run GPU jobs.
The jobs requesting access to GPU machines have to use them with the highest possible efficiency.
Indeed, the GPUs monopolized by an inefficient job are lost for the jobs really needing them.
To check that you job is really using the GPUs:

  • nvidia-smi allows you to visualize the efficiency of your interactive jobs in real time
  • the footer created at the end of the job (in the output file) provides the metrics related to its mean efficiency during its whole duration.

The GPU jobs are run on CentOS 7 and the current platform consists of 2 types of GPUs:

  • 10 Dell C4130 with 4 GPUs and 16 CPU cores per machine :
    • 2 Xeon E5-2640v3 (8c @2.6 Ghz)
    • 128 GB RAM
    • 2 Nvidia Tesla K80 → 4 GPU Nvidia GK210 with 12 GB DDR5
    • InfiniBand between nodes
  • 6 Dell C4140 with 4 GPUs and 20 CPU cores per machine :
    • 2 Xeon Silver 4114 (10c @2.2 GHz)
    • 4 NVidia Tesla V100 PCIe → 4 GPU Nvidia with 32 GB HBM2
    • No InfiniBand

CUDA 10.1 and OpenCL 1.2 environments are available in /opt/cuda-10.1.

CUDA is updated regularly, but the n-1 version is always kept available to answer specific needs. At the moment, CUDA 9.2 is available in /opt/cuda-9.2.
To use any software unavailable on the computing farm, or to run an earlier version, CC-IN2P3 offers the Singularity virtualization solution.

If you need CUDA or OpenCL libraries, you should add in your script :

  • bash :
 if ! echo ${LD_LIBRARY_PATH} | /bin/grep -q /opt/cuda-10.1/lib64 ; then
  • csh :
 if ($?LD_LIBRARY_PATH) then
       setenv LD_LIBRARY_PATH /opt/cuda-10.1/lib64:${LD_LIBRARY_PATH}
       setenv LD_LIBRARY_PATH /opt/cuda-10.1/lib64
This page is focused only on the submission syntax needed for GPU jobs. For a much larger documentation on how to use the GE batch system follow this link.

To submit a multi-core GPU job you need to specify the GPU queue (for instance “-q mc_gpu_long”), the number of GPUs needed (for instance “-l GPU=2”, up to 4 GPUs are available per machine) and the dedicated multicore environment (“-pe multicores_gpu 4”).
You must also specify the type of GPU you want to use with the GPUtype complex. The possible values for this complex are: K80 and V100 (for example: “-l GPUtype=K80”).

In summary the qsub options are:

> qsub -l GPU=<number_of_gpus> -l GPUtype=<gpu_type> -q <QueueName> -pe multicores_gpu 4 ...  

The available batch queues are:

  • mc_gpu_medium,
  • mc_gpu_long,
  • mc_gpu_longlasting (restricted access).

The “CUDA_VISIBLE_DEVICES” variable is set automatically. Example :

> qsub -l GPU=2 -l GPUtype=K80 -q mc_gpu_long -pe multicores_gpu 4 ... 

Submissions for multicore jobs must request exactly 4 CPUs (-pe multicores_gpu 4) to be executable. However, this request does not limit the job: it can actually use more or less than 4 CPUs depending on its need; please avoid occupying all the machine's CPUs (16) if you do not use all the GPUs.

To submit a parallel GPU job you need to specify:

  • the queue:
    -q pa_gpu_long 
  • the number of intended GPU for each node
    -l GPU=x, with 1 ≤ x ≤ 4
  • the openmpigpu_4 environment that will allow to determine the number of allowed nodes :
    -pe openmpigpu_4 x, with x = (4 * the number of nodes you wish to use)
  • the only GPU type allowing parallel jobs is K80:
    -l GPUtype=K80

Your script must contain some OpenMPI specific settings (including MPIEXEC launch), mentioned in the parallel jobs section.

In summary the qsub options are :

> qsub -l GPU=<number_of_gpus_per_node> -l GPUtype=K80 -q pa_gpu_long -pe openmpigpu_4 <number_of_machines_times_4> ...

The available batch queue is : pa_gpu_long (restricted access, authorizar following your czar's request).


> qsub -l GPU=3 -l GPUtype=K80 -q pa_gpu_long -pe openmpigpu_4 8

The interactive GPU jobs are run with the qlogin command (with the same options as multi-core job qsub) and choosing the mc_gpu_interactive queue.

In summary the qlogin options are :

> qlogin -l GPU=<number_of_gpus> -l GPUtype=<gpu_type> -q mc_gpu_interactive -pe multicores_gpu 4 


> qlogin -l GPU=1 -l GPUtype=V100 -q mc_gpu_interactive -pe multicores_gpu 4 

Submissions for interactive jobs must request exactly 4 CPUs (-pe multicores_gpu 4) to be executable.

To compile your CUDA source code you will need to connect on the GPU interactive machine using the following example command line:

> qlogin -l GPU=1 -l GPUtype=K80 -q mc_gpu_interactive -pe multicores_gpu 4

you will then be in SSH connection with the machine, where you will be able to compile your code with nvcc compiler:


Once your code is compiled, we suggest you disconnect from the interactive machine and submit your jobs with qsub from a cca.

  • en/jobs_gpu.txt
  • Last modified: 2019/05/09 12:24
  • by David BOUVET