Job monitoring¶
Job submission status¶
The squeue
command is used to display various information about a job. It gives, among other things, the execution time, the current state (ST
column, with possible state R
for running and PD
for pending), the name of the job, and the partition in which the job is executed:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
465 multiseq hello user R 0:01 1 ccwtbslurm01
The main options to squeue
are:
-t [running|pending]
- selects to display the running or pending job state
[[-v] -l] -j
- display a specific job.
-l
for a long output,-v
for a verbose output
For more information about this command and its outputs, please refer to the official documentation:
Job efficiency¶
The seff
command displays the resources used by a specific job and calculate efficiency.
% seff <job number>
Job ID: <job number>
Cluster: ccslurmlocal
User/Group: <user>/<group>
State: CANCELLED (exit code 0)
Cores: 1
CPU Utilized: 00:12:50
CPU Efficiency: 98.59% of 00:13:01 core-walltime
Job Wall-clock time: 00:13:01
Memory Utilized: 120.00 KB
Memory Efficiency: 0.00% of 0.00 MB
Job hold and alteration¶
The scontrol
command allows jobs management. With the options hold
, update
and release
, it allows respectively to hold a job (take it out of the queue), to modify it, then to put it back in the queue:
% scontrol [hold|update|release] <jobs id list>
For more information about this command, please refer to the help scontrol -h
.
Job deletion¶
The scancel
command allows to delete one or more jobs:
% scancel <job number>
Or all of a specific user’s jobs:
% scancel -u <user id>
For more information about this command, please refer to the help scancel -h
.
Ended job status¶
The sacct
command verifies and displays the state, the partition and the account of a job:
% sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1377 stress.sh multiseq ccin2p3 8 CANCELLED+ 0:0
1381 stress.sh multiseq ccin2p3 8 COMPLETED 0:0
1381.batch batch ccin2p3 8 COMPLETED 0:0
The output format may be occasionally customized with the --format
option:
% sacct --format="Account,JobID,NodeList,CPUTime,MaxRSS"
Account JobID NodeList CPUTime MaxRSS
---------- ------------ --------------- ---------- ----------
ccin2p3 1523 ccwslurm0001 00:10:14
ccin2p3 1523.batch ccwslurm0001 00:10:14
ccin2p3 1524 ccwslurm0001 00:10:14
or modify the environment variable SACCT_FORMAT
to define a new output:
% export SACCT_FORMAT=Account,JobID,NodeList,CPUTime,MaxRSS
% sacct
Account JobID NodeList CPUTime MaxRSS
---------- ------------ --------------- ---------- ----------
... ... ... ... ...
To display the complete list of available fields:
% sacct -e
For more information about this command, please refer to the help sacct -h
.