Submit a job

To submit jobs on the computing platform, you must have a computing account and connect to an interactive server.

Depending on whether you want to launch a job interactively or through the job scheduler (check the page Types of jobs), a job is submitted using respectively the UGE commands qlogin or qsub with the following syntax:

% qsub -P P <myproject> [options] [scriptfile | - [args script]]

The -P P_<myproject> option is used to declare your project, [options] are the qsub command options, and [scriptfile] is the script to be executed, followed by its arguments [script args].

Note

Options can also be specified inside the script: submission script example for a single-core job.

The myproject name is normally the same as your group name; but it can also be a subgroup of your group if subgroups have been created. It is an option strongly recommended. To know your default project, type:

% qconf -suser <loginname>

Your loginname appears in this list once you have submitted your first job. If this is not the case, you can search for the name of your project among the defined projects. To view all defined projects, type:

% qconf -sprjl

Note

If you belong to more than one group, use the newgroup command to switch to the group corresponding to the project. See the dedicated FAQ for more details.

To view the configuration of a specific project, type:

% qconf -sprj <projectname>

Useful submission options

Below several useful options to keep in mind for any job type. For more specific options please refer to the page Types of jobs.

-C <prefix>
allows to modify the prefix used in a script to define the options to the job scheduler (default #$ is used)
-N <jobname>
allows to specify the name of the job; by default, the job scheduler uses the name of the script
-S <shell path>
to specify the shell used when submitting the job
-cwd
to work in the submission directory; by default, the outputs will be in the user’s $HOME
-e <path>
sets the location of the stderr file
-o <path>
sets the location of the stdout file
-j <y | n>
merges the stdout and stderr outputs into a single file; default n (= do not merge)
-r <y | n>
allows to specify if, in the case where the job fails after a system failure, it will be restarted; default y except for interactive jobs
-a <date>
allows to submit a job requesting its execution on a specific date [[CC]] YY] MMDDhhmm [.SS]
-M <emailaddress>
sends an e-mail to this address if some events, specified by the -m option, occur
-m <b, e, a, s, n>
b when the job goes into execution, e when the job is done, a when the job is abandoned, s when the job is interrupted, n no mail will be sent (default). Note that this option can create overloads in case of mass mailing.
-q <queuename>
to define the execution queue; to find out the queues list and their settings, see the paragraph Queues.
-pe
lets you specify the parallel environment and number of cores to use, see the pages Multi-core jobs and Parallel jobs
-l <resource1=value,resource2=value,...>
to request resources for a job, see the paragraph Resources declaration
-V
to transfer all your environment variables
-hold_jid <jobid>
submits the job and puts it “on hold”, waiting for job <jobid> to complete its execution
-p <priority>
reduces the priority of the job (0 by default, possible negative values)

For more details on the different options of the qsub command, see:

% man submit
# or
% man qsub

Note

If no file name is provided, the qsub command advances the cursor to the beginning of the next line and waits for you to enter commands manually. These commands, entered one by one, will constitute the executed job. To end the submission type Ctrl-D.

Resource declaration

Storage and licenses

Storage systems accessed by your jobs, as well as the software licenses that are used, must be declared upon submission. This declaration is done using the -l option of the qsub command. For storage resources: dcache, hpss, irods, mysql, oracle, sps, xrootd. For license resources: idl, matlab, etc. For example :

% qsub -l hpss=1,sps=1 test.sh
% qsub -l matlab=1 test.sh

If you do not use a resource, you do not need to specify it, the default being 0 (zero).

Computing resources

If you do not declare the computing resources (CPU, memory, disk) the scheduler allocates the maximum limits of the requested queue. The limit values evolve over time depending on the configurations of the computing servers. Exceeding a limit value will cause the execution of your job to be stopped by the job scheduler.

Note

To know the values of the consumed CPU, memory, and disk required by your job, you can do short tests on an interactive server (cca). Informations about the resources used are also available to you in the banner of the end of the log file that you get when the job is finished.

Computing resources are declared as a “hard” (h_xxx) or a “soft” (s_xxx) limit. When the hard limit is exceeded, the job is interrupted by a SIGKILL signal and the job scheduler immediately kills your job (exit_status 137).

If you want your job to be notified so that it can finish normally before it is killed, specify instead the soft limit. If the soft limit is exceeded, a SIGXCPU or SIGXFSZ signal, which can be captured by the job, is sent (exit_status 152 for example). If your job exceeds the limit during a time greater than the NOTIFY attribute of the queue where it runs, it will be killed. To know the hard / soft limits and the attributes, type:

% qconf -sq "*" | egrep "qname|s_xxx|h_xxx"

# where xxx = the needed resource code

or check the page Information on scheduling queues.

CPU resource

To specify the CPU time needed for your job, use the h_cpu resource. For example, to ask for 1 hour and 40 minutes, write:

-l h_cpu=6000          # duration in seconds in "hard limit"
# or
-l s_cpu=6000          # duration in seconds in "soft limit"
# or
-l s_cpu=01:40:00      # duration in format hh:mm:ss in "soft limit"
# or
-l h_cpu=01:40:00      # duration in format hh:mm:ss in "hard limit"

Memory resource

To specify the maximum resident memory required for your job, use the h_rss or r_rss resource. For example to ask for 2 GB, write:

-l h_rss=2G             # hard resource
# or
-l s_rss=2G             # soft resource

You can also specify the maximum virtual memory requirement in the same way as above, using vmem instead of rss

The default unit is a byte, the other possible units are K(ilo), M(ega), G(iga). The requested memory must be at least 64M. In case you submit multicore jobs, the specified memory requirement should be given per core.

Disk resource

When running a job, a local disk space of the computing server is allocated to you. You can read and write the data on this space. It is accessible to your job through the $TMPDIR environment variable. This space is cleaned at the end of your job by the job scheduler. You must copy the data that you want to keep in an appropriate storage space. To specify the maximum file size that the job can create, use the h_fsize or s_fsize resource. For example, if you need a maximum of 4096 MB (or 4 GB), you must specify:

-l h_fsize=4096M             # hard resource
# or
-l s_fsize=4096M             # soft resource

The default unit is a byte, the other possible units are K(ilo), M(ega), G(iga). The requested h_fsize must be at least 64M.

Attention

The amount of data written to a single file cannot exceed the h_fsize limit, even on a remote storage space. Otherwise, the job will receive a SIGXFSZ signal and will be killed.

Queues

The job scheduler uses the concept of queues to distinguish between different types of jobs. A job is always submitted to a queue. An execution queue corresponds to default values concerning disk space, CPU time, memory, etc.

When submitting a job, you normally do not have to specify the execution queue. The scheduler looks for an optimal queue compared to the CPU, memory and disk space required by your job. The queue must be declared if you want to use a specific queue for parallel, multi-core or daemon jobs or ot access to a restricted queue. The queue will be specified with the -q option:

% qsub -q <queuename> ...

To view the list of available queues, type:

% qconf -sql

To view the properties of a particular queue, type:

% qconf -sq <queuename>

Or have a look to the page Information on scheduling queues. If, in the properties of a queue, the attribute user_lists has a value other than NONE, it means that access to this queue is restricted.

Attention

For multicore and parallel queues, the CPU time and memory are expressed per core, disk space is global for the job.

There are queues with restricted access (GPU, parallel, multi-core, demon, longlasting, huge. For their use, check the page Types of jobs) for which you must make a request validated by your czar. When you use these queues, the scheduler automatically checks whether you have the rights or not, you do not have to declare the queue.