Dask usage

The package daskslurm may be found in the GitLab project DaskSlurm (authentication to GitLab is necessary to access the project page). It enables a CC-IN2P3 Jupyter Notebook Platform user to run one or more tasks on the CC-IN2P3 computing platform.

Important

While it opens a connection to the computing servers, Dask IS NOT an alternative tool to submit jobs on the computing platform. Please refer to the dedicated section if this is your objective.

The computing ressource used by Dask tasks will be deducted from the annual computing group allocated HTC ressource.

Prerequisites

To enable this feature, daskslurm needs to be installed in the currently used Python environment.

Simple examples of usage of dask can be found in the GitLab project page.

Installation in your Python environment

The daskslurm package is installed into the 2 provided scientific default kernels. If you are using a custom kernel, you have to install it. Please have a look to the custom kernel page if you need to make such a custom kernel.

To install the daskslurm wheel:

% export daskslurm_package_registry="https://__token__:gldt-Jziuo-bHMVzJcrPV-8Wz@gitlab.in2p3.fr/api/v4/projects/36268/packages/pypi/simple"
% python -m pip  install  --extra-index-url  "${daskslurm_package_registry}"  daskslurm=='0.1.3'

Note

Please refer to the GitLab project page to check available version, and to select the latest.

Features

The dask-workers will be running on our computing platform. The dask-client, running in your notebooks server, will be connected to the dask-scheduler.

Attention

HPC jobs will not be allowed through the Dask functionality.

Getting a dask-client

To get a dask-client, you will have to wait for the dask-worker jobs being in RUNNING status.

import daskslurm
from daskslurm.cluster import DaskSlurmCluster
my_daskslurm_cluster = DaskSlurmCluster()

Cancelling computing tasks

Computing tasks are automatically cancelled when stopping the notebooks server, by selecting File > Log Out. Nevertheless it’s possible to cancel tasks from your Jupyter notebook via the close() method.

my_daskslurm_cluster = DaskSlurmCluster()
...
my_daskslurm_cluster.close()

Default parameters (which can be updated) for the compute tasks

dask_worker_jobs default is: 1 job, max value: 3000;

dask_worker_memory default is: 3G, max value: 32G;

dask_worker_cores default is: 1 core, max value: 32;

dask_worker_time default is: 02:00:00 (2 hours), max value: 48:00:00 (2 days);

See the constructor docstring DaskSlurmCluster(...) for the complete list of parameters.

Logging messages

The jobs STDOUT and STDERR are written in the log/ directory of the current working area. There is one file per job. Since logging is useful to help debugging, a ready-to-use logger is available via the get_logger() method.

# get_logger available as function
from daskslurm.cluster import get_logger
my_logger = get_logger()
my_logger.info("Processing data files ...")

When used in Python code part processed on dask-workers, the messages will be sent to the job stdout files.

scale() method

The scale() method allow to increase, or decrease the number of dask-workers without having to recreate a DaskSlurmCluster.

my_daskslurm_cluster = DaskSlurmCluster()
# Scale to 10 dask-worker job(s)
my_daskslurm_cluster.scale(jobs=10)

Limitations

Maximum number of dask-worker jobs

This number is principally related to the number of available slots in the computing platform. This depends on several factors like the group resources allocation, the number of simultaneous users, the current global load of the computing platform, …

Internal tests were carried out with 2500 jobs (= 2500 dask-workers) for a single user.