Submit a job¶
allows the submission of a batch script. For the interactive jobs, please refer to the
sruncommand described below.
% sbatch my_slurm_script.sh sbatch: slurm_job_submit: Submission node: cca001 sbatch: slurm_job_submit: Set partition to: htc sbatch: slurm_job_submit: Info : Partition limited to one node per job. Submitted batch job 936607
Upon submission, the command outputs some information: the submission machine, here
htcpartition where the job will be run, and the job id,
- allows the allocation of the specified resources, and runs a program or an application. Furthermore,
srunallows the execution of parallel tasks, and is generally called in a script setting different commands or tasks to run in parallel (parallel jobs).
sbatch commands admit the same set of parameters, but the
srun execution, unlike
sbatch is not associated to an interactive shell. The main consequence is that the errors potentially encountered during the execution of
srun will not be reported in the output file (which is the case for
sbatch). To associate the command to a shell, use the option
The job submission directory is the working directory.
TMPDIR may be used for storing large amounts of data, but it is not the working directory. You may force Slurm to use this space, or any other space, as your workspace using the option
-D | --chdir=.
If any option is not declared upon submission, SLURM will allocate by default the following parameters to the job:
--partition=htc --ntasks=1 --time=7-00:00:00
To facilitate the computing farm management, we suggest to declare an approximate estimate of the job’s execution time.
Essential sbatch options¶
You may find below a brief description of the essential
sbatch options. To browse all the available options, please refer to the command help
-n | --ntasks=
- states the maximum number of parallel tasks lauched by the job. By default it corresponds to allocated CPU number. If this option is used with
srun, then the task will be repeated
-c | --cpus-per-task=
- states the number of cores per process. This option must be specified if a parent process launches parallel child processes.
- states the amount of needed memory, ex. 5G
-N | --nodes=
- states the number of the computing servers needed
-L | --licenses=
- states the types of storage and software resources needed by the job
-p | --partition=
- allows to select the chosen partition
-t | --time=
- sets a limit on the total run time of the job allocation. Acceptable formats: “hours:minutes:seconds”, “days-hours:minutes:seconds”.
-A | --account=
- states the group to be charged with the resources used by the job
-a | --array=
- allows the description of a job array,
,is the separator to define a list or
-to define an interval, ex.
-J | --job-name=
- defines the job name
-D | --chdir=
- sets the working directory of the batch script before it is executed. The path can be specified as full path or relative path
-o | --output=
- states the file name for the standard output, by default
-e | --error=
- states the file name for the error messages
- states the email to receive the chosen notifications, such as job state changes
- states the type of alert to be notified
Environment and limitations¶
TMPDIR environment variable is set to
/tmp. During job, this folder is mounted on a current user private folder in
/scratch/slurm.<jobid>.<taskid>/tmp. Temporary data and mapping are deleted at the end of job, ensuring both privacy and security.
It is recommanded to use
TMPDIR for performances when reading an important volume of data, as
/scratch partition is huge and local.
Each computig group is limited on the number of slots concurrently occupied by running jobs; the limit will depend on the group yearly resources request. To know this limit for a given group (account):
% sacctmgr show assoc where account="<group>" user= format="Account,Grpcpus" Account GrpCPUs ---------- -------- <group> 2000
Once this limit is reached, the subsequent submissions will be pending in queue until the completion of running jobs frees the required number of slots. However, if you receive the following error upon submission:
sbatch: error: AssocMaxSubmitJobLimit sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
or, monitoring a waiting job you have this
% squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 936551 htc littlene user PD 0:00 1 (AssocMaxJobsLimit)
you have probably been blocked! please check your Inbox or contact the user support.
CPU and memory requests¶
Without any specification upon submission, the number of cores allocated to the job will be 1, and the associated memory will be 3G. This ratio between the number of cores and the memory will be kept by Slurm correcting (over-approximating) the allocated resources values. Examples:
% sbatch -n 2 --mem=4G my_script.sh
- Slurm will allocate the 2 CPU cores, since the resulting associated memory (
2 x 3 = 6 GB) covers the requested memory.
% sbatch -n 2 --mem=8G my_script.sh
- Slurm will allocate here 3 CPU cores to allow the
8Grequest to be accepted (
3 x 3 = 9 GB).
CPU cores and memory will be limitated by the physical resources available of the worker nodes. For instance, for the default partition
htc, the maximum number of available CPU cores is 64, and the memory associated is 192 GB.
It is not necessary to specify both the memory and the number of CPU cores required. You just have to specify the resource which will set the value of the second.
- In the case of the
htc_highmempartition, 37 GB per CPU core will be allocated,
- In the case of the
gpupartition, the necessary CPU and memory ressources will be automatically allocated according to the number of requested GPUs.
Storage and licence resources¶
The storage systems accessed by your jobs, as well as any software licenses used, must be declared upon submission. This is carried out using the
-L option of the
% sbatch -L sps,matlab my_script.sh
In order to know the resources limits, use the command
% scontrol show lic
For an exhaustive list of resources requiring declaration upon submission:
% scontrol show lic | grep default
To find out the resource limitations of a
% scontrol show lic | grep -A1 <groupe>
_<group> on the submission line
Please refer to the MATLAB page to know the correcti declaration syntax.