Data formats
Choosing the right data format makes your research data easier to share, understand, and reuse. For FAIR practice, use formats that are open, well documented, and accepted by your community.
Why this matters
In chemistry, data can include spectra, structures, reactions, and metadata. The chosen format influences whether:
- others can open your files,
- repositories accept your submission,
- your data remain usable in the long term.
Quick guidance for everyday work
- Prefer open formats whenever possible.
- Keep both raw data and processed data.
- Add clear metadata, including units and context.
- Check repository requirements early.
- If instruments produce proprietary files, export an open exchange format as well.
- For structure data, store both an exchange file (e.g. SDF) and identifiers (SMILES, InChI).
- For spectroscopy, use JCAMP-DX for exchange when available.
Recommendations by use case
- General table-like data exchange: CSV (with clear headers and units).
- NMR data exchange: JCAMP-DX, nmrML, NMReDATA.
- Mass spectrometry exchange and archiving: mzML.
- Crystallography deposition: CIF.
- Structure exchange: SDF, SMILES, InChI.
- Chemical table files (Molfile, rxnfile, SDF): use V2000 for broad interoperability; use V3000 if you need advanced features and tool support is confirmed.
- Spectral exchange: JCAMP-DX is widely supported across techniques; for very large or complex datasets, check whether a more suitable format is required.
When possible, select formats with broad software support and active community maintenance.
Common pitfalls
- Keeping only proprietary instrument files without an open export.
- Comparing SMILES strings without canonicalization.
- Omitting stereochemistry in SMILES when isomers matter.
- Sharing table files without clear units or column meaning.
Common chemistry formats
| Format | Data type | Maintainer | Parent Format | Specification |
|---|---|---|---|---|
| JCAMP-DX | multiple | IUPAC | ASCII, Text | open |
| AnIML | multiple | ASTM | XML | open |
| netCDF | multiple | UCAR | CDF | open |
| CSV | multiple | IETF-RFC | ASCII, Text | open |
| ASCII | multiple | (open) | self explanatory | |
| ISA | multiple | ISA Commons Community | TSV or JSON | open |
| UDM | multiple | Pistoia Alliance | XML | open |
| ADF | multiple | Allotrope | HDF5+RDF | for members |
| mzML | mass spectrometry | HUPO/PSI | XML | open |
| ANDI-MS | mass spectrometry | ASTM International | netCDF | open |
| nmrML | NMR | COSMOS | XML | open |
| NMReDATA | NMR | NMReDATA Initiative | SDF | open |
| Bruker FID | NMR | Bruker | (binary) | proprietary |
| mnova | NMR | Mestrelab | (binary) | proprietary |
| Bruker OPUS | spectroscopy | Bruker | (binary) | proprietary |
| Perkin Elmer | spectroscopy | Perkin Elmer | ASCII, Text | proprietary |
| ThermoFisher Grams | spectroscopy | ThermoFisher | binary | proprietary |