Data Management Plan

DATA MANAGEMENT PLAN

Summary

The goal of the UPC Dynamical Systems Group is to advance both theoretical and applied aspects of dynamical systems. Data are usually generated in one of two ways. First, theoretical advances are typically supported by intensive computer simulations, which generate a significant amount of data and computer code that will be released. In more applied projects, in addition to the types of data mentioned above, there is a need to use real-world data, usually drawn from public databases obtained from open data portals or via request. These data are typically processed in order to obtain secondary datasets.

The format of the data is mainly based on open specifications:

Openly accessible data that are tabular in nature are usually UTF-8 encoded, comma-separated value text files (.CSV), with column headers, or JSON files in some cases. The same format will be used for derived data obtained from these sources.
Data from computer simulations are usually stored in ASCII files, ready to be used by plotting programs to produce figures.
Computer code is implemented in different open-source programming languages such as C, C++, Julia, Python, or R and, in some cases, using software packages such as MATLAB or Maple.

For all published files, a documentation record and change log will be included (author contact information, status, version, reason for change and date, description of contents, and origin of the data, including a brief description of the measurement and/or experimental setup). This information will be provided either as a preamble in the file or through a README file.

The resulting data may be useful for scientists working in different applications of dynamical systems or interested in applications of the models.

FAIR Data

2.1 Findable data (including metadata)

The repository assigns Handles/DOIs for persistent identification and citability of the dataset. Files will be structured according to project, lead partner, publication ID, figure number, and filenames. All open project results deposited in a repository will include search keywords together with their metadata. Keywords for open data will be selected from controlled vocabularies appropriate for the specific type of data.

Open-source software will follow the semantic versioning scheme suggested by GitHub. The same approach may also be applied to datasets. In addition, all open data, publications, and open-source software deposited in the Zenodo community of the group (https://zenodo.org/communities/upcdynamicalsystems/records) will use the DOI versioning system. We will use the Dublin Core schema for dataset metadata, which is compatible with the European OpenAIRE repository.

2.2 Accessible data

All data associated with scientific publications will be made openly available by default, unless there is a specific reason not to publish them. In addition to repositories, a list of all datasets will be provided through the group’s website:

https://dynamicalsystems.upc.edu/en/computing/software-data

Once processing, quality control, organisation, analysis, and publication are complete, the data will be made accessible through deposition in open-access repositories (e.g. Zenodo). These repositories provide access via web browsers and/or application programming interfaces (APIs), complemented by customised tools developed by users in specific domains. When accessed through APIs, proper documentation will be provided.

The scientific community will have access to the computer code produced by the group through existing platforms such as GitHub, listed on the group website:
https://dynamicalsystems.upc.edu/en/computing/software-data

To facilitate reuse, algorithms will be implemented in C/C++, Python, Julia, R, or other open-source languages, and, where appropriate, code for proprietary software such as MATLAB will also be released.

We have verified that the repository requirements are satisfied. There are no restrictions on the use of the published data, but users will be required to acknowledge the consortium and the source of the data in any resulting publications. Creative Commons licenses supported by GBIF will be used, including CC0, CC-BY, and CC BY-NC. Zenodo supports a wide range of widely used and domain-specific machine-readable licenses. The data owner will determine which license is applied when data are deposited in repositories. However, the project recommends CC0 for data and CC-BY for media, and discourages CC-BY-NC.

User identity will not be directly recorded. However, users are expected to follow standard scientific citation practices, and data use will be tracked through citations.

2.3 Interoperable data

The data produced in the project will be interoperable, as datasets will adhere to standard formats such as ASCII, TXT, CSV, XML, JSON, and TIFF. If MS Office tools, PDF viewers, or image viewers cannot be used, a text (ASCII) file will be provided with the dataset explaining where a free reader can be obtained. Other types of data follow internal codifications, clearly documented within the files.

2.4 Reusable data

Wherever possible, data will be shared immediately after production under the Creative Commons Attribution 4.0 International License (CC BY 4.0). The data will remain reusable after the end of the project by any interested party, with no access or time restrictions.

Each archived dataset will have its own persistent repository identifier and will be easily accessible. We expect most data generated to be released without restrictions; only datasets subject to IPR or confidentiality constraints will be restricted. In such cases, agreements will be established on a per-dataset basis. Requests for data use by external users will be approved by the project authors.

Data quality is ensured through platform procedures and internal validation. The data will remain reusable for at least 20 years, in accordance with Zenodo preservation policies, unless the service is discontinued.

Resource allocation

There are no costs associated with the mechanisms described for ensuring FAIRness and long-term preservation of the database.

4. Data security

Data recovery, secure storage, and transfer of confidential data are addressed as follows:

Data confidentiality and integrity are ensured at multiple levels. Data in transit is secured using secure transfer mechanisms such as TLS 1.2 (Transport Layer Security).

Consortium partners will enforce strict policies for all employees, collaborators, and subcontractors with access to data. These policies include, but are not limited to:

Allowing local copies only during data processing, with mandatory deletion after use
Extending access control policies to local copies
Contractual confidentiality clauses
Acceptance of terms and conditions prior to access

Data will be pseudonymised to a level that does not compromise research quality. In addition, awareness of data privacy and security will be promoted continuously.

Ethical aspects

All activities carried out in the group comply with ethical principles and relevant Spanish, EU, and international legislation. When personal data are used, studies will be conducted in accordance with the Declaration of Helsinki (Fortaleza revision, October 2013). In studies involving patients, informed consent will be obtained, and protocols will be submitted to the Ethics Committee of the leading medical institutions.