Essential questions

This page provides additional information on the essential questions (marked with an asterisk *) on the DMP portal.

These questions are important, because they will help define the data life cycle of the collaboration’s projects and establish a guide to optimal management of the collaboration’s storage (both short and long term) with CC-IN2P3 resources. Throughout the project activity, this document will be reviewed each year at the time of the resource request to be modified or confirmed as is, according to project developments.

General

The purpose of this section is to gather general information about the project for which the DMP will be established: subject, field, schedule. Main partners in the project are to be defined, and the specific requirements of the scientific discipline can be outlined.

Topic

What is the main research question of the project?

The purpose of this question is to briefly set out the project context, role and mission.

Research field

To which research field(s) does this project belong to?

Please select your project’s research field from the drop-down list.

The green button [+ Discipline] allows you to mention more than one field.

Project schedule

When does the project start? / When does the project end?

Indicate the dates when the project begins and ends (even if unknown, an estimation for the latter is welcome).

Important

Before the beginning of the project, an initial meeting between the project managers and the involved CC-IN2P3 staff should be scheduled to discuss the project data life cycle.

Project coordination

Which persons or institutions are responsible for the project coordination?

Please enter the names of the project coordinators.

The green button [+ Entry] allows you to dedicate a line to each new person or institution involved.

Project partners

This page allows to define, if necessary, the different project partners. Once a partner has been added using the [+ Project partner] button, it is essential to answer the following question:

Who is/are the contact person(s) for data management questions?

Please enter the contact details of the project’s data managers.

These contacts will be the referrals, within the partnership, for the following operations, for the data stored by the project at CCIN2P3:

  • access rights,

  • transfer of ownership,

  • departure of collaborators,

  • request for storage space,

  • change of organization,

  • request for revision of data management policy.

The green button [+ Entry] allows you to dedicate a line to each contact.

Note

If the project partners are not defined, the referral role is kept by the project coordinators.

Content classification

The purpose of this section is to provide a short description of the datasets collected or generated by the collaboration. An input should be provided for each dataset. The [+ Dataset] button will open a new tab for each new dataset.

It will be possible to describe each dataset characteristics: character, discipline, volume, collection method and/or creation, and data flow management.

You can also mention all the data (even existing data) that will be (re)used for others projects, and define their reproducibility level.

Attention

For the DMP to be validated, please answer each of the questions in this section for each dataset.

Datasets

What kind of dataset is it?

Please describe briefly the type of data and/or the method used to create or collect the data.

For each type of data, please specify the character (raw, reduced…), creation method, discipline….

The purpose is to define the life cycle of all project data according to the use that will be made of it (immediate and/or future), in order to implement the appropriate services and prepare for long-term preservation if necessary.

Technical classification

The purpose of this section is to briefly describe how the datasets (as defined in Content classification and available on the tabs) will be collected and with which schedule. An estimate of the collected and/or produced datasets associated volumes will also be provided.

Descriptions should include the type and content of each dataset, as well as data flow management. The tools used to compute the data, as well as those considered in case of versioning, should also be included.

Attention

For the DMP to be validated, please answer each of the questions in this section for each dataset.

Data collection

When does data collection or creation start / end?

Date on which the project starts collecting data and date on which no more new data will be produced. This is to set up the services that will enable the data to be stored and accessed.

The end date may become the starting point for new operations on the data side (migration, data archiving) or on the processing side.

The data profile previously described will be taken into account to define the appropriate services.

When does data analysis start / end?

Dates on which data processing is scheduled to start and end.

This period covers a new stage in the data lifecycle from which new storage services may be involved.

Data size

What is the current or expected size of the dataset?

For each dataset, select from the choices given an estimate of the total associated volume over the project lifetime.

This information will enable the CC-IN2P3 staff to plan purchases to provide the necessary storage for the project.

How much data is produced per year?

The answer to this question is necessary if the volume of the dataset is at or above the TB scale. This will allow the CC-IN2P3 storage management to draw up a multi-year plan.

Tools

Which tools, software, technologies or processes are used to generate or collect the data?

List the tools you are aware will be used to generate/collect the data within the project.

This information will enable CC-IN2P3 staff to plan the deployment of the requested tools, or to offer you an equivalent service.

Which software, processes or technologies are required to use the the data?

List the tools you are aware will be necessary to run the project data.

This information will enable CC-IN2P3 staff to plan the deployment of the requested tools, or to offer you an equivalent service.

Is documentation about relevant software needed to use the data?

Answer Yes or No if the software needed to use the data will require a dedicated documentation (this is the case for “in-house” or customized tools).

Data usage

The purpose of this section is to describe how datasets (as defined in Content classification and available on the tabs) will be used and accessed in order to highlight their life cycle and plan their organization within storage infrastructures.

Indicate how data will be organized during the project: naming conventions for directories and files, version control, etc…

You will also need to explain how they are checked, and document the consistency and quality of the data collected.

Attention

For the DMP to be validated, please answer each questions in this section for each dataset.

Usage scenarios

How often will this dataset be used?

This question helps to identify hot data (accessed frequently) and cold data (accessed infrequently) in order to define their destination in the different CC-IN2P3 storage systems.

To what extent will infrastructure resources be required?

Define, from the given choices, whether this dataset operation will require specific infrastructure resources.

This information will enable CC-IN2P3 staff to evaluate the request, and check whether it can be met.

Data organisation

Where is the dataset stored during the project?

Define where the dataset will be stored. This information will allow to plan data backups.

Indeed, if precious (raw data) or important (reduced data, analysis data) data will be stored only at CC-IN2P3, several copies will have to be planned in order to protect their integrity.

Data storage and security

Who is allowed to access the dataset?

Specify how data is shared within the project.

As a general rule, certain actions are restricted to the person in charge of the collaboration, and all other members have the same permissions. Some project areas will allow members to modify contents, while others are strictly read-only.

How and how often will backups of the data be created?

Specify the data for which backups are required, the areas concerned and the frequency (daily/weekly/monthly).

Who is responsible for backups?

Please enter the names of the people in charge of backups. They will be CC-IN2P3 main contacts in case of any backup issue, as well as for discussing changes to the backup procedures defined in the previous question.

The green button [+ Entry] allows you to dedicate a line to each contact.

Data sharing and re-use

Will this dataset be published or shared?

Define, from the given choices, how the dataset will be shared and published.

Storage and long-term preservation

The purpose of this section is to clarify the process of long-term data management. Not all data is intended to be retained for ever.

That is why it is important to determine how long data should be retained. For data with a limited lifespan, a clear data management policy concerning its deletion enables more efficient use of available storage space, and reduces the volume of associated metadata.

This also reduces the time needed to locate data of interest.

Digital data needs to be actively managed so that it is always available, usable and accessible for the preservation time required by the project.

Selection

What are the criteria/rules for the selection of the data to be archived?

Define how the data to be archived will be selected after the project has ended. This information is important to ensure that all data to be archived is correctly indexed by the storage systems.

Who selects the data to be archived?

Please enter the names of the people responsible for this selection. They will be CC-IN2P3 main contacts in the event of any issues or updates in the data selection procedure.

The green button [+ Entry] allows you to dedicate a line to each contact.

Long-term preservation

Please provide an answer for each dataset to each question on this page. Datasets must have been defined in the Content classification and are available on the tabs.

Does this dataset have to preserved for the long-term?

Answer Yes or No if the dataset is to be kept for the long term, to be accessed after the end of the project.

Some of the data could be reused within the community or outside for related researches.

What are the reasons this dataset has to be preserved for the long-term?

Define, from the given choices, the reasons for preserving the dataset.

How long will the data be stored?

Evaluate how long the data must be preserved.

How long is it intended that the data remains re-usable?

Evaluate how long the data will be used after the end of the project, in order to maintain the tools for reading and exploiting it.

Attention

The duration of storage and reuse are not binding for CC-IN2P3 outside IN2P3 commitments with the collaboration.

Where will the data be stored or archived after the end of the project?

Define, from the given choices, the site and system in which the data will be stored after the end of the project.

Shall there be an embargo period before the data are made available?

Answer negatively, or indicate how you plan to limit data access, so that it remains exclusive to project members before being shared or published.

By when will the data be archived?

Specify the estimated date from which the data will be archived. In other words, the date after which it will no longer be used directly by the project.