Job monitoring

Job submission status

The squeue command is used to display various information about a job. It gives, among other things, the execution time, the current state (ST column, with possible state R for running and PD for pending), the name of the job, and the partition in which the job is executed:

% squeue
JOBID PARTITION     NAME     USER      ST       TIME      NODES NODELIST(REASON)
465   multiseq      hello    user      R        0:01      1     ccwtbslurm01

The main options to squeue are:

-t [running|pending]
selects to display the running or pending job state
[[-v] -l] -j
display a specific job. -l for a long output, -v for a verbose output

For more information about this command and its outputs, please refer to the official documentation:

Job efficiency

The seff command displays the resources used by a specific job and calculate efficiency.

% seff <job number>
Job ID: <job number>
Cluster: ccslurmlocal
User/Group: <user>/<group>
State: CANCELLED (exit code 0)
Cores: 1
CPU Utilized: 00:12:50
CPU Efficiency: 98.59% of 00:13:01 core-walltime
Job Wall-clock time: 00:13:01
Memory Utilized: 120.00 KB
Memory Efficiency: 0.00% of 0.00 MB

Job hold and alteration

The scontrol command allows jobs management. With the options hold, update and release, it allows respectively to hold a job (take it out of the queue), to modify it, then to put it back in the queue:

% scontrol [hold|update|release] <jobs id list>

For more information about this command, please refer to the help scontrol -h.

Job deletion

The scancel command allows to delete one or more jobs:

% scancel <job number>

Or all of a specific user’s jobs:

% scancel -u <user id>

For more information about this command, please refer to the help scancel -h.

Ended job status

The sacct command verifies and displays the state, the partition and the account of a job:

% sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1377          stress.sh   multiseq    ccin2p3          8 CANCELLED+      0:0
1381          stress.sh   multiseq    ccin2p3          8  COMPLETED      0:0
1381.batch        batch               ccin2p3          8  COMPLETED      0:0

The output format may be occasionally customized with the --format option:

% sacct --format="Account,JobID,NodeList,CPUTime,MaxRSS"
   Account        JobID        NodeList    CPUTime     MaxRSS
---------- ------------ --------------- ---------- ----------
   ccin2p3 1523            ccwslurm0001   00:10:14
   ccin2p3 1523.batch      ccwslurm0001   00:10:14
   ccin2p3 1524            ccwslurm0001   00:10:14

or modify the environment variable SACCT_FORMAT to define a new output:

% export SACCT_FORMAT=Account,JobID,NodeList,CPUTime,MaxRSS
% sacct
   Account        JobID        NodeList    CPUTime     MaxRSS
---------- ------------ --------------- ---------- ----------
       ... ...                      ...        ...        ...

To display the complete list of available fields:

% sacct -e

For more information about this command, please refer to the help sacct -h.