Computing resources

Interactive development

CC-IN2P3 provides any user having a computing account a JupyterLab platform to script, execute and document their work interactively. For details and usage constraints, please refer to the section below:

Attention

The Jupyter Notebooks Platform will not allow the user to submit his jobs on the computing platform. Please refer to the following paragraphs if this is your goal.

The computing platform

The computing platform accessible by the job scheduler is composed of Linux CentOS-7 (64-bit) computing servers. It is composed of three main use cases (please find some use examples in the Types of jobs page):

  • the HTC platform (High-Throughput Computing)
  • the HPC platform (High-Performance Computing)
  • the GPU platform

The HTC platform is suitable for running most traditional HEP mono or multi-core applications: it accounts for most of the computing power made available at CC-IN2P3.

The HPC platform, of smaller capacity, is designed to accommodate parallel calculations. It is composed of a set of servers connected in Infiniband, which allows an effective communication inter-servers via the use of MPI libraries.

Finally, the GPU platform is composed of a group of servers equipped with graphic cards to accommodate vector calculation applications.

Note

To know the technical characteristics of the compute servers, please refer to the page Compute servers configuration.

The job scheduler

In general, a job is a task (or set of tasks) that the user wants to run on the servers of the computing platform. This task can be an executable file, a set of commands, a script, and so on. A job can be developed and tested on interactive servers before being massively submitted to the computing platform.

Univa Grid Engine (UGE) is a job scheduler. The scheduler is the only entry point common to all users to submit jobs on the computing platform. Its role is to receive jobs submitted by users, to schedule them and submit them for execution on an appropriate and available computing server.

The main goal is to use the computing resources (memory, disk space, CPU) in the most efficient way possible. The sharing of all resources for all users allows optimal use of the entire computing platform.

A job is always submitted on an execution queue. Each execution queue has default values for disk space, CPU time and memory. There are several queues for jobs that need a lot of resources (CPU, memory, multiple processors), and they are restricted. For the job to run on a queue of this type, the user must be in the list of authorized users (see the restricted queues FAQ).

Once submitted, the job is automatically checked to see if it is allowed to run in a particular queue:

  • if the user requests a queue:
    • if the user is authorized, the scheduler checks whether the requested hardware requirements of the job correspond with the resources provided by the queue;
      • if they match, and if there are resources available, the job is executed;
      • if they do not match, the job is waiting.
    • if the user is not authorized, the job remains on hold.
  • if the user does not request a queue:
    • the scheduler takes the first queue that meets the requirements;
    • if there is no appropriate queues, the job remains pending.

All queues allow the simultaneous execution of many jobs. The system always tries to launch new jobs in a least loaded and most appropriate queue.