All scientists generate scientific data independently of whether they are working practically or theoretically, e.g., reaction protocols, analytical data, or calculated data. These data can be domain-specific, however, they are always accompanied by metadata. Furthermore, depending on the career level, there might be different requirements and responsibilities.
In fact, all scientists are responsible for the documentation and organisation of their data in order to ensure that the data can be archived in a FAIR (Findable, Accessible, Interoperable, Re-usable) way. Most research and funding institutions provide internal guidelines for research data management (RDM), such as the DFG. In addition, many funding institutions encourage or even enforce the publication and storage of FAIR data. For good data handling practices, the use of data management plans is recommended. In general, the handling of research data should follow the data life cycle.
Over the last years, many new digital tools have been developed to support the researchers in their RDM needs. The use of electronic lab notebooks (ELNs) is essential to support scientists in their daily work. By already using an ELN for experiment planning, it can be guaranteed that all data is directly digitally available. It is best to use an ELN that covers the entire data life cycle – starting with experiment planning, passing through data collection, processing, and analysis to publication and re-use including data documentation and data organisation. In this case, the daily work can all be performed in one digital environment. For example, after peak assignment in the obtained spectroscopic data, peak lists can automatically be generated by the ELN for theses or publications. In addition, if the ELN is directly connected to a repository, data can be transferred into the repository easily for data publication. The use of repositories is essential for data deposition according to the FAIR principles. During deposition, a persistent identifier is assigned to the data, which can be cited in the related publication.
Generally, selecting an ELN and a repository – generic or sub-domain specific – depends on the chemical sub-domain. The combination of both enables effective data storage and archiving in terms of internal databases and data publications for advanced collaboration and data re-use and fulfils the data handling practices of the data life cycle. Moreover, the storage of research data according to the FAIR principles is crucial for upcoming machine learning approaches or big data analysis.
Best practice examples
Using the electronic lab notebook Chemotion ELN in combination with the repository Chemotion-Repository realises efficient data handling for synthetically working chemists. The entire Chemotion package allows to collect, analyse, process, and store different types of analytical data attached to the reaction procedure in one digital environment. During the seamless export from the ELN into the repository, a persistent identifier (DOI) is assigned to the deposited data – creating one single entry per molecule. This DOI is given in the related publication in addition to the analytical details of one compound (as depicted in figure 1) to link the journal publication to the data publication. Thus, findability of the data files for re-use is guaranteed. Furthermore, the repository of Chemotion is connected to other databases (e.g., PubChem) to ensure best visibility and a user-friendly search of original research data.
Figure 1: Section of the experimental part in journal publication DOI: 10.1039/d1dt00832c, citation of data publication marked in orange.
Several working groups of NFDI4Chem and beyond already deposit research data in the Chemotion-Repository. For a better understanding of how data publications are linked to related journal publications, the following examples including their data (open-access) can be viewed:
- Modular Synthesis of trans‐A2B2‐Porphyrins with Terminal Esters: Systematically Extending the Scope of Linear Linkers for Porphyrin‐Based MOFs, DOI: 10.1002/chem.202003885.
- Next Generation of Zinc Bisguanidine Polymerization Catalysts towards Highly Crystalline, Biodegradable Polyesters, DOI: 10.1002/anie.202008473.
- Synthesis of new pyrazolo[1,2,3]triazines by cyclative cleavage of pyrazolyltriazenes, DOI: 10.3762/bjoc.17.187.
- Exceptional Substrate Diversity in Oxygenation Reactions Catalyzed by a Bis(μ-oxo) Copper Complex, DOI: 10.1002/chem.202000664.
- Insertion of [1.1.1]propellane into aromatic disulfides, DOI: 10.3762/bjoc.15.114.
If researchers need help with their own data sets, NFDI4Chem can provide support for consistent data handling. In this case, please contact us via our data pledge!