Computing best practices
How to write your script
A submission script allows to formalize your jobs configuration. Please use this method rather than declaring parameters on the command line.
If you rename the output and error files (
stdout
,stderr
), use a shared directory and use%j
in the path or name (see Main sbatch options),manage script errors with the
set
bash command as explained in the submission scripts examples,prioritize several small jobs that will run faster than a single larger job. Use
srun
and Slurm steps only for real needs (parallel computing or multiple independent tasks),if a file or set of files is accessed multiple times, it is recommended to copy everything locally to
TMPDIR
,the
/scratch
local space (accessible viaTMPDIR
) is shared: in case of overflow, this can impact other workers and jobs (for more details, see the Computing environment page). Plan to clean upTMPDIR
at the end of your job.
Test and parameterize your job
The actions suggested below will allow you to evaluate the amount of resources needed for the successful execution of your production. If, even after the following evaluation steps, you realize that there is a configuration error in your running job, please choose to suspend and modify the job rather than cancel it.
Use the
flash
and/or*interactive
partitions to test the job. This step is particularly important before submitting a large number of jobs,if necessary, profile your job to study its exact behavior,
from the above steps, check that the job is using the most of the requested resources and adjust the job settings (time, number of CPUs per task, and memory) to bring the resource request closer to the estimated usage,
declare the storage and software resources required for your job,
Once the submission parameters have been validated, integrate them into the script header,
When launching similar jobs (which differ in one or more parameters specific to the computing), use array jobs to limit the load on the computing platform.
When the job ends
Check the User Portal to review CPU and memory efficiency,
in case of error, check the output files.
Submit a ticket
Information to provide in the ticket (depending on the case, not all information is necessary). Please use attachments to share multiple lines of code :
The ID of a job affected by the issue. If possible, also provide the ID of a job that run succesfully,
the submission parameters (command line and header of the script),
the output and error files,
the entire submission script,
if the job remains pending, please copy the message in the
REASON
field from the squeue output.