Research data repositories are locations where digital objects are stored and made available to the public. They can be regarded as the core of research data sharing as they provide platforms to store, curate, publish, archive, preserve, and access data. Pre-print servers can also be considered as repositories, although these are tailored toward scientific articles rather than research data.
Repositories can be classified mainly according to:
- the type of objects to be stored (e.g. scientific articles or research data) and
- the domain of the data contained (field-specific or generic repositories),
Repositories can be hosted at institutional servers or are provided by broader organisations or consortia such as NFDI4Chem. The use of repositories is essential for data deposition according to the FAIR Data Principles.
How do repositories work?
A repository is constituted by a repository software and a database. Researchers transfer their data to the repository typically via a browser-based user interface, and/or the repository operators harvest the data from other platforms via appropriate protocols and interfaces.
Some, but not all, repositories curate and review the data before ingestion with regard to their content and quality, sometimes also regarding legal aspects (copyright, data protection, licenses).
In order to allow data reuse by other researchers, metadata, including provenance information, are required beside the actual data. Metadata describe the research data and provide information about its creation, the methods or software used as well as legal aspects. Metadata can be either added manually via a metadata editor or can be provided through other applications. The process to manually add metadata via a metadata editor can be compared to the process of submitting a manuscript to a publisher via the publishers submission system.
One main function of repositories is to provide a search function, with which users and machines can find, view, and download data. In order to ensure that data are permanently referenced and can be linked and cited, repositories assign unique persistent identifiers (PIDs). This also enhances the findability and accessibility of research data.
Repositories can also be certified (e.g. CoreTrustSeal). Such certification ensures that the data is citable, preserved in the long run, and may also cover aspects of data curation and data quality.
Finding the right repository
Because of the multitudes of data repositories existing, users can better orient themselves using a research data repository registry such as Re3data, OpenDOAR, ROAR, and FAIRsharing. Repository registry services are an essential component of the FAIR Data Principles as they should help researchers navigate through thousands of data repository services to find the most appropriate repository for their data.
However, due to complex landscape of existing repositories, these registries tend to provide confusingly long lists that do not necessarily facilitate the selection of a suitable repository. Moreover, researchers should take the reuse policy, metadata sharing and availability, long-term availability of data, and public accessibility of a repository into account, which is often not obvious or easy to figure out.
To ease the selection of a suitable research data repository for chemistry research data, NFDI4Chem provides a lists of trusted chemistry-friendly repositories in the guide on how to choose the right repository.
Sources and further information
- Repository Platforms for Research Data IG
- FAIRsFAIR Repository Support Series: Using registries to improve the visibility of your repository service
- The Repository Chemotion: infrastructure for sustainable research in chemistry
- Chemotion ELN: an open source electronic lab notebook for chemists in academia
- German: Was ist ein Repositorium? Forschungsdaten.info