Submission examples

Job submission through command line needs to adhere to the following syntax:

% sbatch -t 0-00:30 -n 1 --mem 2G job.sh
sbatch: INFO: Account: ccin2p3
sbatch: INFO: Submission node: cca013
sbatch: INFO: Partition set to: htc
sbatch: INFO: Partition limited to one node per job.
sbatch: INFO: Time limit set to: 0-00:30 (30 minutes)
Submitted batch job 10280305

-t <j-hh:mm>: specifies the evaluated time limit. All acceptable formats are explained in Essential sbatch options paragraph.
-n <number>: specifies the number of tasks requested (in this syntax equivalent to the number of cores). For a multi-core job, <number> will be larger than 1.
--mem <number>: specifies the amount of memory requested.
job.sh: your executable task script.

The three parameters indicated above must always be expressed upon submission. The -c parameter (number of CPUs per task) can override -n (see Required parameters limits).

Upon submission, information is returned to you in the standard output: the computing group (Account, here ccin2p3), the submission node (here cca013), the partition htc (default) in which the job will be executed, and the job identifier contained in the environment variable SLURM_JOB_ID (here 10280305) .

If we want to access a resource requiring declaration, we will use the option -L:

% sbatch -t 0-00:30 -n 4 --mem 2G -L sps,matlab job.sh

Important

Please check that your resources request does not exceed the hardware limits on the computing platform nodes.

Submission scripts

It is possible to define submission options directly in a batch script using one of the following syntaxes, depending on computing requirements. The batch script will be then submitted as easily as:

% sbatch my_batch_script.sh

Please refer to paragraph Essential sbatch options for details on the submission options shown below.

#!/bin/bash

# SLURM options:

#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --output=serial_test_%j.log   # Standard output and error log

#SBATCH --partition=htc               # Partition choice (htc by default)

#SBATCH --ntasks=1                    # Run a single task
#SBATCH --mem=2000                    # Memory in MB per default
#SBATCH --time=1-00:00:00             # Max time limit = 7 days

#SBATCH --mail-user=<e-mail>          # Where to send the e-mail
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)

#SBATCH --licenses=sps                # Declaration of storage and/or software resources

# Commands to be submitted:

module load python
python my_python_script.py

In this example, we set up a Python environment using modules in order to run the my_python_script.py, that will need the SPS storage resource. All the required Slurm options have been given using #SBATCH instructions.

Computing tasks may be run at the same time from inside a script with the “srun” command, provided that the resources declared by the #SBATCH lines are not exceeded. For example:

#!/bin/bash

#SBATCH --job-name=mulitple_jobs       # Job name
#SBATCH --output=mulitple_jobs_%j.log  # Standard output and error log

#SBATCH --partition=htc                # Partition choice (htc by default)

#SBATCH --ntasks=2                     # Run up to two tasks
#SBATCH --mem=2000                     # Memory in MB per default
#SBATCH --time=1-00:00:00              # Max time limit = 7 days

#SBATCH --mail-user=<e-mail>           # Where to send the e-mail
#SBATCH --mail-type=END,FAIL           # Mail events (NONE, BEGIN, END, FAIL, ALL)

#SBATCH --licenses=sps                 # Declaration of storage and/or software resources

# Commands to be submitted:

srun -n 1 --exclusive -L sps script_sps.sh &
srun -n 1 --exclusive script_software.sh &
srun -n 1 --cpus-per-task 2 script_software.sh
wait

The --exclusive option allows the task not to share the resource. The bash syntax allows to:

&: run tasks in parallel; if a task does not find an available resource, it stays pending and a warning message is written in the output,
wait: wait for all tasks to complete before exiting the script.

#! /bin/bash

#SBATCH --job-name=multicpu_jobs       # Job name
#SBATCH --output=multicpu_jobs_%j.log  # Standard output and error log

#SBATCH --partition=htc                # Partition choice (htc by default)

#SBATCH --ntasks=2                     # Run up to two tasks
#SBATCH --mem=2000                     # Memory in MB per default
#SBATCH --time=1-00:00:00              # Max time limit = 7 days

#SBATCH --mem-per-cpu=1000             # Allocate 1G of memeory per core
#SBATCH --cpus-per-task=4              # Allocate 4 cores per task, i.e 8 cores in total in this example (ntasks x cpus-per-task)

#SBATCH --mail-user=<e-mail>           # Where to send the e-mail
#SBATCH --mail-type=END,FAIL           # Mail events (NONE, BEGIN, END, FAIL, ALL)

#SBATCH --licenses=sps                 # Declaration of storage and/or software resources

# Commands to be submitted:

my_multicpu_job.sh

One must specifiy the number of CPUs (cpus-per-task) and the memory used per CPU (mem-per-cpu) to prevent all sub-processes launched by my_multicpu_job.sh from running on a single CPU.

Important

Please check that your resources request does not exceed the hardware limits on the computing platform nodes.

Interactive job

An interactive job submission is done with the srun command. Use the -L option for resources requiring declaration.

HTC interactive session

% srun -t 0-08:00 -n 4 --mem 2G --pty bash -i

Running interactively an HTC executable

% srun -t 0-08:00 -n 4 --mem 2G job.sh

In the case of a GPU interactive job, the appropriate partition must be specified.

% srun -p gpu_interactive -t 0-08:00 --mem 2G --gres=gpu:v100:1 --pty bash -i

-p: to select the partition,
--gres=: allows you to declare the use of a GPU resource, and to define its parameters,
--pty: allows interactivity with the open session (see previous note).

To quit the interactive session:

% exit

Job array

A job array allows to execute multiple times the same script in parallel. It can be useful to run the same simulation in parallel to quickly increase the statistics.

% sbatch -t 0-00:30 -n 1 --mem 2G --array=0-3 job.sh

Array jobs will have additional environment variables set:

SLURM_ARRAY_JOB_ID: will be set to the first job ID of the array,
SLURM_ARRAY_TASK_ID: will be set to the job array index value,
SLURM_ARRAY_TASK_COUNT: will be set to the number of tasks in the job array (in our example: 4),
SLURM_ARRAY_TASK_MAX: will be set to the highest job array index value (in our example: 0),
SLURM_ARRAY_TASK_MIN: will be set to the lowest job array index value (in our example: 3).

Important

Please check that your resources request does not exceed the hardware limits on the computing platform nodes.

Parallel job (MPI)

These are jobs executing parallel operations, possibly on different computing servers, using an MPI (Message Passing Interface) type interface through an InfiniBand connection. The hpc partition must be specified in the submission command line:

% sbatch -p hpc -t 0-02:00 -n 8 --mem 2G -N 2 job.sh

-N <number>: specifies the number of required computing servers (nodes). If the number of computing servers (-N option) is 1, it is not necessary to indicate it, nor to specify the partition. If the CPU tasks number (option -n) exceeds the CPU hardware limit in a node, the scheduler will “overflow” to a second node, even if the -N option is not specified.

MPI configuration

In parallel computing, the Process Management Interface (PMI) allows the MPI process to interact with the process manager by adding information to the database (“put” operations) and querying information added by other processes in the application (“get” operations).

PMIs are available in our SLURM installation. To list the available PMIs, run the following command on an interactive server:

% srun --mpi=list

The correlation between the PMI and the required MPI type is shown in the SLURM MPI Users Guide.

To implement MPI in your parallel calculations, please add one of the following syntaxes to the job.sh script you will submit:

Using the software installed on our system:

module load openmpi
mpirun -np 4 <path>/<script openmpi>

Using the PMI installed in SLURM:

srun --mpi=pmix -n 4 <path>/<script openmpi>

Using the software installed on our system:

module load mpich
mpirun -np 4 <path>/<script mpich>

Using the PMI installed in SLURM:

srun --mpi=pmi2 -n 4 <path>/<script mpich>

GPU job

These are jobs that run on computing servers equipped with GPUs. Two GPU types are available on the computing platform:

% sbatch -t 0-01:00 -n 1 --mem 10G --gres=gpu:v100:1 job.sh

% sbatch -t 0-01:00 -n 1 --mem 10G --gres=gpu:h100:1 job.sh

Here, we request the allocation of a single GPU, but it is possible to use several, up to the limit of GPUs in the corresponding node. The limit on the number of tasks -n <N> is explained in Required parameters limits.

If you omit the GPU type in your command line,

% sbatch -t 0-01:00 -n 1 --mem 10G --gpus 1 job.sh

the scheduler will assign by default the type v100 to your job.

Important

Please check that your resources request does not exceed the hardware limits on the computing platform nodes.

Using CUDA

The CC-IN2P3 provide a complete Nvidia environment (drivers, CUDA, CUDnn and NCCL libraries), and upgrade it on a regular basis. To know the current CUDA version please use the following command from an interactive GPU server:

% nvidia-smi

If you need to use a previous Nvidia environment, the CC-IN2P3 provides Apptainer images. These container images are available from the container repositery:

% ls -lsah /cvmfs/singularity.in2p3.fr/images/HPC/GPU
centos7_cuda10-0_cudnn7-4-2_nccl2-4-2.simg
centos7_cuda10-0_cudnn7-6-5_nccl2-5-6.sif
centos7_cuda10-1_cudnn7-5_nccl2-4-2.simg
centos7_cuda10-1_cudnn7-6_nccl2-7-6.sif
centos7_cuda11-3_cudnn8-2-0_nccl2-9-8.sif
centos7_cuda12-1_cudnn8-9-1_nccl2-18-1.sif

A more detailed documentation about how to use such containers is available in the CC-IN2P3 GitLab.

For the syntax required to submit a GPU job in a container, please refer to our relative documentation.

To compile your CUDA code, you should connect to an interactive GPU server and then use the nvcc compiler:

% /usr/local/cuda-12/bin/nvcc

Once the code is compiled, we recommend you to exit the interactive server and submit your jobs with sbatch.

Note

To profile GPU jobs, CUPTI (CUDA Profiling Tools Interface) is installed on our computing nodes. Since CUPTI is directly linked to CUDA the installed version is the same.

Daemon job and recursivity

For long running jobs with low resource consumption (to motor or orchestrate other jobs) choose the htc_daemon partition (see Required parameters limits):

% sbatch -p htc_daemon -t 90-00:00 -n 1 --mem 1G job.sh

More generally, the computing time limit can be circumvented with a recursive job script which re-submits itself and remains queued until the first job disappears. The command line below should be written inside the script itself, preferably at the beginning of the script.

% sbatch --dependency=afterany:$SLURM_JOBID my_batch_script.sh

$SLURM_JOBID being the identifier of the current job from which the job is launched.

Building a job pipeline

It is possible to build a job pipeline using the Slurm dependency option:

% sbatch --dependency=<type>:job_id[:job_id][,<type>:job_id[:job_id]] -t 0-01:00 -n 2 --mem 2G job.sh

Using the following dependencies types :

after:jobid[:jobid...]: job can begin after the specified jobs have started
afterany:jobid[:jobid...]: job can begin after the specified jobs have terminated
afternotok:jobid[:jobid...]: job can begin after the specified jobs have failed
afterok:jobid[:jobid...]: job can begin after the specified jobs have run to completion with an exit code of zero
singleton: jobs can begin execution after all previously launched jobs with the same name and user have ended. This is useful to collate results of a swarm or to send a notification at the end of a swarm.

A more detailed example of such pipeline is given on this page.