Submit a job

To submit a job on the computing platform, you must have a computing account and connect on an interactive server. Three commands allow job submission.

sbatch

allows the submission of a batch script. For the interactive jobs, please refer to the srun command described below.

% sbatch my_slurm_script.sh
sbatch: slurm_job_submit: Submission node: cca001
sbatch: slurm_job_submit: Set partition to: htc
sbatch: slurm_job_submit: Info : Partition limited to one node per job.
Submitted batch job 936607

Upon submission, the command outputs some information: the submission machine, here cca001, the htc partition where the job will be run, and the job id, 936607.

srun
allows the allocation of the specified resources, and runs a program or an application. Furthermore, srun allows the execution of parallel tasks, and is generally called in a script setting different commands or tasks to run in parallel (parallel jobs).

The srun and sbatch commands admit the same set of parameters, but the srun execution, unlike sbatch is not associated to an interactive shell. The main consequence is that the errors potentially encountered during the execution of srun will not be reported in the output file (which is the case for sbatch). To associate the command to a shell, use the option --pty.

Note

The job submission directory is the working directory. TMPDIR may be used for storing large amounts of data, but it is not the working directory. You may force Slurm to use this space, or any other space, as your workspace using the option -D | --chdir=.

Attention

If any option is not declared upon submission, SLURM will allocate by default the following parameters to the job:

--partition=htc
--ntasks=1
--time=7-00:00:00

To facilitate the computing farm management, we suggest to declare an approximate estimate of the job’s execution time.

Essential sbatch options

You may find below a brief description of the essential sbatch options. To browse all the available options, please refer to the command help sbatch -h.

-n | --ntasks=
states the maximum number of parallel tasks lauched by the job. By default it corresponds to allocated CPU number. If this option is used with srun, then the task will be repeated n times.
-c | --cpus-per-task=
states the number of cores per process. This option must be specified if a parent process launches parallel child processes.
--mem=
states the amount of needed memory, ex. 5G
-N | --nodes=
states the number of the computing servers needed
-L | --licenses=
states the types of storage and software resources needed by the job
-p | --partition=
allows to select the chosen partition
-t | --time=
sets a limit on the total run time of the job allocation. Acceptable formats: “hours:minutes:seconds”, “days-hours:minutes:seconds”.
-A | --account=
states the group to be charged with the resources used by the job
-a | --array=
allows the description of a job array, , is the separator to define a list or - to define an interval, ex. --array=0,6,16-32
-J | --job-name=
defines the job name
-D | --chdir=
sets the working directory of the batch script before it is executed. The path can be specified as full path or relative path
-o | --output=
states the file name for the standard output, by default slurm-jobid.out
-e | --error=
states the file name for the error messages
--mail-user=
states the email to receive the chosen notifications, such as job state changes
--mail-type=
states the type of alert to be notified

Environment and limitations

By default, TMPDIR environment variable is set to /tmp. During job, this folder is mounted on a current user private folder in /scratch/slurm.<jobid>.<taskid>/tmp. Temporary data and mapping are deleted at the end of job, ensuring both privacy and security.

Important

It is recommanded to use TMPDIR for performances when reading an important volume of data, as /scratch partition is huge and local.

Each computig group is limited on the number of slots concurrently occupied by running jobs; the limit will depend on the group yearly resources request. To know this limit for a given group (account):

% sacctmgr show assoc where account="<group>" user= format="Account,Grpcpus"
   Account  GrpCPUs
---------- --------
   <group>     2000

Once this limit is reached, the subsequent submissions will be pending in queue until the completion of running jobs frees the required number of slots. However, if you receive the following error upon submission:

sbatch: error: AssocMaxSubmitJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

or, monitoring a waiting job you have this squeue output:

% squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            936551       htc littlene     user PD       0:00      1 (AssocMaxJobsLimit)

you have probably been blocked! please check your Inbox or contact the user support.

CPU and memory requests

Without any specification upon submission, the number of cores allocated to the job will be 1, and the associated memory will be 3G. This ratio between the number of cores and the memory will be kept by Slurm correcting (over-approximating) the allocated resources values. Examples:

% sbatch -n 2 --mem=4G my_script.sh
Slurm will allocate the 2 CPU cores, since the resulting associated memory (2 x 3 = 6 GB) covers the requested memory.
% sbatch -n 2 --mem=8G my_script.sh
Slurm will allocate here 3 CPU cores to allow the 8G request to be accepted (3 x 3 = 9 GB).

CPU cores and memory will be limitated by the physical resources available of the worker nodes. For instance, for the default partition htc, the maximum number of available CPU cores is 64, and the memory associated is 192 GB.

Note

It is not necessary to specify both the memory and the number of CPU cores required. You just have to specify the resource which will set the value of the second.

Important

  • In the case of the htc_highmem partition, 37 GB per CPU core will be allocated,
  • In the case of the gpu partition, the necessary CPU and memory ressources will be automatically allocated according to the number of requested GPUs.

Storage and licence resources

The storage systems accessed by your jobs, as well as any software licenses used, must be declared upon submission. This is carried out using the -L option of the sbatch command:

% sbatch -L sps,matlab my_script.sh

In order to know the resources limits, use the command scontrol:

% scontrol show lic
  • For an exhaustive list of resources requiring declaration upon submission:

    % scontrol show lic | grep default
    
  • To find out the resource limitations of a <group>:

    % scontrol show lic | grep -A1 <groupe>
    

Attention

Omit _default or _<group> on the submission line

Please refer to the MATLAB page to know the correcti declaration syntax.