Submission examples

Submission script

It is possible to define submission options directly in a batch script using the following syntax:

#!/bin/sh

# SLURM options:

#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --output=serial_test_%j.log   # Standard output and error log

#SBATCH --partition=htc               # Partition choice
#SBATCH --ntasks=1                    # Run a single task (by default tasks == CPU)
#SBATCH --mem=3000                    # Memory in MB per default
#SBATCH --time=1-00:00:00             # 7 days by default on htc partition

#SBATCH --mail-user=<email address>   # Where to send mail
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)

# Commands to be submitted:

module load python
python my_python_script.py

In this example, we set up a Python environment using modules in order to run the my_python_script.py. All the required Slurm options have been given using #SBATCH instructions, and thus this batch script could be submitted as easily as:

% sbatch my_script.sh

Multiple tasks script

Multiple srun tasks may be launched from the script, provided that the resources declared in the #SBATCH rows are not exceeded. For example:

#!/bin/sh

#SBATCH --job-name=mulitple_jobs

#SBATCH --ntasks=2
#SBATCH --output=mulitple_jobs_%j.log
#SBATCH --licenses=sps

####################################
srun -n 1 --exclusive -L sps script_sps.sh &
srun -n 1 --exclusive script_software.sh &
srun -n 1 --cpus-per-task 2 script_software.sh
wait

The --exclusive option allows the task not to share the resource. The bash syntax allows to:

  • &: run tasks in parallel; if a task does not find an available resource, it stays pending and a warning message is written in the output,
  • wait: wait for all tasks to complete before exiting the script.

In the case where the submitted task is a script running multiple parallel processes, one must specifiy the number of cpus used per task:

#! /bin/bash

#SBATCH --ntasks=4 #run max 4 parallel tasks and allocate 4 CPUs
#SBATCH --cpus-per-task=4

my_parallel_script.sh

otherwise, all sub-processes launched by my_parallel_scirpt.sh will run on a single CPU.

Single-core job

A single-core job submission is done with the following command:

% sbatch my_script.sh

The job will then be executed in the default partition, i.e. htc and with the QoS by default, i.e. normal.

If we want to access a resource requiring declaration, we will use the option -L:

% sbatch -L sps my_script.sh

Multi-core job

A multi-core job submission is done as before, but specifying the required cores number:

% sbatch -n 8 my_script.sh
-n <number>
specifies the number of allocated CPUs for my_script.sh

Interactive job

An interactive job submission is done with the srun command and you must specify the appropriate partition.

% srun -p htc_interactive --pty bash -i         # single-core

% srun -p htc_interactive -n <N> --pty bash -i  # multi-core (N tasks)

% srun -p gpu_interactive --gres=gpu:[type]:1 --pty bash -i      # GPU
-p
to select the partition
--gres=
allows you to declare the use of a GPU resource, and to define its parameters.
--pty
allows interactivity with the open session (see previous note).

To quit the interactive session:

% exit

Parallel job (MPI)

These are jobs executing parallel operations, possibly on different computing servers, using an MPI type interface through an InfiniBand connection. The hpc partition must be specified:

% sbatch -p hpc -n 8 -N 2 my_script.sh
-N <number>
specifies the number of required computing servers

Note

If the number of computing servers (-N option) is 1, it is not necessary to indicate it, nor to specify the partition.

GPU job

These are jobs that run on computing servers equipped with GPUs. They can be multi-core, parallel, and interactive. The submission of such jobs must be done on the dedicated partition, i.e. gpu:

% sbatch -p gpu --gres=gpu:1 my_script.sh

Here, we ask for the allocation of a single GPU (any type), but it is possible to use several.

A single type of GPUs is available at CC-IN2P3: Nvidia V100, labelled by the keyword v100. The type may be requested directly as a --gres parameter:

--gres=gpu:v100:1

Using CUDA

If you need to use specific CUDA libraries, add in your script:

  • bash

    if! echo $ {LD_LIBRARY_PATH} | /bin/grep -q /usr/local/cuda-11.3/lib64 ; then
           LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64:${LD_LIBRARY_PATH}
    fi
    
  • csh

    if ($?LD_LIBRARY_PATH) then
           setenv LD_LIBRARY_PATH /usr/local/cuda-11.3/lib64:${LD_LIBRARY_PATH}
    else
           setenv LD_LIBRARY_PATH /usr/local/cuda-11.3/lib64
    endif
    

Important

Several CUDA environments are available, and are updated on a regular basis. The current version of the Nvidia drivers is 465.19.01; the default CUDA version is 11.3 and cudnn version is 8.2.0.

To compile your CUDA code, you should connect to an interactive GPU server and then use the nvcc compiler:

% /usr/local/cuda-11.3/bin/nvcc

Once the code is compiled, we recommend you to exit the interactive server and submit your jobs with sbatch.

Recursive job

For long-duration jobs with low resource consumption (daemon job type), the 7-day limit may be overcome with a recursive job which re-submits itself and stays in queue until as long as the first job does not disappear. The command line below should be written inside the script of the job itself, preferably at the beginning of the script.

sbatch --dependency=afterany:$SLURM_JOBID job.sh

$SLURM_JOBID being the identifier of the current job from which the job is launched.

Building a job pipeline

It is possible to build a job pipeline using the Slurm dependency option:

% sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> my_script.sh

Using the following dependencies types :

after:jobid[:jobid...]
job can begin after the specified jobs have started
afterany:jobid[:jobid...]
job can begin after the specified jobs have terminated
afternotok:jobid[:jobid...]
job can begin after the specified jobs have failed
afterok:jobid[:jobid...]
job can begin after the specified jobs have run to completion with an exit code of zero
singleton
jobs can begin execution after all previously launched jobs with the same name and user have ended. This is useful to collate results of a swarm or to send a notification at the end of a swarm.

A more detailed example of such pipeline is given on this page.