Physical chemistry is an interdisciplinary science at the frontier between chemistry and physics, whose topics go beyond the classical areas of the respective individual sciences. While preparative chemistry focuses on questions of the methodology of chemical synthesis of known and new substances, physical chemistry attempts to describe the properties of substances and their transformation by applying concepts of physics to objects of chemistry by means of theoretical and experimental methods. Along with organic and inorganic chemistry, physical chemistry therefore represents one of the three key disciplines of "classical" chemistry, since it provides the theoretical basis for technical chemistry and process engineering. Its knowledge is also an integral part of many other disciplines and is used, for example, for description and understanding in biology and medicine, meteorology as well as the earth sciences. Due to this great interdisciplinarity and the use of numerous physicochemical methods in almost all areas of chemistry, a complete description of physical chemistry as profile is hardly possible, which is why this article explicitly makes no claim to do so.
Methods Profiles
EPR spectroscopy
What is it?
- Electron Paramagnetic Resonance spectroscopy belongs together with NMR (Nuclear Magnetic Resonance) spectroscopy to the group of magnetic resonance methods
- measures the resonant microwave absorption of a paramagnetic sample in an external magnetic field (i.e measurement needs unpaired electrons)
For what?
- provides information about the electronic/atomic structure and the chemical environment (e.g. local environmental polarity) of the sample
- for the characterisation of molecular dynamics on the time scale of approx. 10 ps-1 μs (allows e.g. conclusions to be drawn about local nanoviscosity)
- for distance measurements in the range of about 1-8 nm
What kind of data is generated?
- almost exclusively proprietary file formats (e.g. .spe or .DTA/.DSC) of the "Bruker Corporation" company
- transfer into open file formats (e.g. .txt or .csv) either via Bruker software on the measuring device itself or via tools like SpinToolbox
- analysis of data using Bruker software or e.g. EasySpin as open-source toolbox for MATLAB
How to do it FAIR?
- documentation of all research data and metadata is carried out digitally using an suitable ELN (possibly in addition to a manual laboratory notebook in paper form)
- experimental conditions (e.g. sample concentration, solvent etc.) and measurement parameters (e.g. frequency, temperature) are noted in the ELN
- observations, deviations from planned measurement protocol or other peculiarities during measurement with no digital output (i.e. no data files) are added manually to the ELN entry of the experiment
- obtained unprocessed raw files from measurements are uploaded to ELN in open file formats and attached directly to the respective ELN experiment entry, including metadata with data on the instrument (e.g. manufacturer, type, etc.), measurement conditions & parameters
- metadata related to the obtained data, such as temperature or solvent of measurement, follow common metadata standards
- research data are processed, analysed and compared with open non-proprietary software tools
- simultaneously with publication as a research article in a scientific journal, the underlying research data is published in an open data repository and linked to the article (incl. semantically richly annotated raw and processed data in open data formats for reuse)
- an unique persistent identifier (e.g. DOI) is generated for each dataset as well as for the journal publication
Quantum Mechanical (QM) calculations
What is it?
- Quantum Mechanical calculations are one of the major computational tools to elucidate molecular properties on a first-principles basis
- solving the Schrödinger equation provides the electronic energy of a molecule/molecular system, from which properties can be derived as higher-order derivatives. Descriptors can also be computed from orbital/density data which is equally available
For what?
- calculated molecular properties include e.g. molecular structures (usually local minima and transition states), energies, spectroscopic parameters/properties, dipole moments, polarizabilities and non-observables such as atomic charges and topological analysis
- properties can be calculated prior to conducting experimental measurements to guide synthesis (computational screening) or a posteriori to help interpret experimental results atomistically
- the application range depends on the level of theory used. Correlated wave function methods are commonly applied to systems with less than 100 atoms, density functional theory (DFT) up to 500 atoms, semiempirical methods can be routinely applied in the range of thousands
What kind of data is generated?
- data formats depend strongly on the program that is used for the QM calculations, e.g. Gaussian, ORCA, Molpro, TURBOMOLE or Jaguar, but generally formatted text files are used as input and log files. Compressed data formats are used to store wavefunction, density information and operators. Molecular structures are provided in human-readable format
- data analysis is carried out using custom scripts. A few programs provide their own scripts for common tasks (such as plotting of molecular orbitals) and dedicated GUIs
How to do it FAIR?
- documentation of all research data and metadata is carried out digitally using a suitable repository (e.g. NOMAD, ioChem-BD or a general-purpose repository) to store the input files, main log and structures files (if not included in the log)
- reproducibility of calculations to within numerical accuracy can be ensured by storing the input files and adding the program and its version (ideally even the compiler version and any compiler flags) as metadata. Numerical thresholds are well defined but reproducibility of calculations across different programs and versions is not guaranteed. This warrants the safekeeping of version specific source files for the same time period as the stored data
- data analysis scripts should be uploaded to the repository in open file formats, attached directly to the corresponding data entry and accompanied with appropriate documentation
- if possible, analysis and evaluation of calculations should be conducted with open, non-proprietary software tools
- simultaneously with publication as a research article in a scientific journal, the data in the repository is linked to the article (incl. semantically richly annotated raw and processed data, if possible in open data formats for reuse)
- a unique persistent identifier (e.g. DOI) is generated for the dataset as well as for the journal publication
- XML and CML (Chemical Markup Language) is used by a few software packages but this is not common practice
Challenges to make data FAIR
- no standardised transfer into open file formats. All repositories of quantum chemical calculations to date make use of in-house parsers to extract the calculation data from uploaded logs. This trend hinders the improvement of FAIR practices since new developers are not provided with a template for log files. Any new software can only be featured in repositories after a unique parser is developed
- lack of open meta-input and output file formats that are necessary to enable full interoperability of different programs and tools used for QM calculations. Particularly concerning is the lack of standards for: z-matrix and xyz file formats, trajectory files in molecular dynamics or structure optimisations, definition of isotopes, potential energy surfaces as well as equations used in the derivation of properties including thermodynamic quantities
Molecular Mechanical (MM) simulations
What is it?
- Molecular Mechanical simulations approximate intra- and intermolecular interactions using simple Newtonian mechanics and neglect quantum effects
- the system is parametrised with a suitable force field and propagated in time by solving the system’s Newtonian equations of motion. Potentials or modifications of the force field parameters can be applied to extract thermodynamic/kinetic data
For what?
- systems can be as large as millions of atoms, allowing for the investigation of protein dynamics and protein-ligand interactions on a microsecond timescale. More complex systems such as protein-protein interactions or proteins embedded in a biomembrane can also be simulated
- simulations of pure liquids, mixtures or interfaces between liquids and solids or gases enable the investigation of such systems
- explaining and interpreting the behaviour of macroscopic systems by investigating them at a microscopic level
What kind of data is generated?
- data formats depend strongly on the program that is used for the MM calculations, e.g. AMBER, CHARMM, GROMACS, LAMMPS or NAMD but in general specifically formatted text files are used as input and log files, and a binary representation for checkpoint and trajectory files
- analysis of data using tools provided by the software’s manufacturer or custom scripts
How to do it FAIR?
- documentation of all research data and metadata is carried out digitally using a suitable repository to store the data
- reproducibility of calculations can be ensured by storing the input file and adding the program and its version (ideally including the compiler and any compiler flags) as metadata
- if possible, analysis and evaluation of calculations should be conducted with open non-proprietary software tools
- simultaneously with publication as a research article in a scientific journal, the data in the repository is linked to the article (incl. semantically richly annotated raw and processed data, if possible in open data formats for reuse)
- a unique persistent identifier (e.g. DOI) is generated for each dataset as well as for the journal publication
Challenges to make data FAIR
- no standardised transfer into open file formats for different simulation packages
- development of open meta-input and output file formats is required to handle the multitude of different programs and tools used in MM calculations in accordance with the FAIR principles. Tools such as PLUMED can help users with this problem
- trajectory files are typically too large to store in commonly used repository environments, even when using compressed file formats. To make this data FAIR, standards for handling large amounts of data must be developed or solutions from other fields applied
- reproducibility of long time-scale molecular dynamics is unattainable (numerical noise will eventually affect the resulting trajectories, especially in a multicore environment). Depending on the numerical accuracy and the specific implementation, deviations can be observed as soon as in the picosecond range. However, thermodynamic averages or other probabilistic measurements should be achieved within a suitable margin of error. This margin would have to be estimated and provided by the authors of a publication
Methods Data Format Overview
Analytical method | Exemplary proprietary file extensions | Typical size of proprietary file | Converterf to open file format | Recommendation for open file extension* | File format | File size of open format | Monomer characterization | Polymer characterization | |
---|---|---|---|---|---|---|---|---|---|
NMR spectroscopy | set of files, no typical extension | <1-50 MB | nmrium.org | .jdx .zip | JCAMP-DX (raw) NMReDATA (assignments) | <1-50 MB | ✔ | ✔ | |
Mass spectrometry | .raw .d .baf | ~250 MB | Proteowizard | .mzML | mzML | ~250 MB | |||
IR spectroscopy | .ispd .icIR | <1 MB | .dx | JCAMP-DX | <1 MB | ✔ | ✔ | ||
Raman spectroscopy | .dpt .spc .icRaman .sps .acs | <1 MB | proprietary software | .dx | JCAMP-DX | <1 MB | ✔ | ✔ | |
UV/vis spectroscopy | .dsw .str .bsk .bkn .ksd .jws .jwb .str8 .spc .sre | <1 MB | proprietary software | .csv | comma-separated values | <1 MB | ✔ | ✔ | |
Fluorescence spectroscopy | .fds .fs2f .jws .opj | <1 MB | proprietary software | .dx | JCAMP-DX | <1 MB | ✔ | ✔ | |
Single crystal XRD | .raw | ~1 GB | proprietary software | .cif | crystallographic information file | <1 MB | |||
Powder XRD | .raw | <1 MB | proprietary software | .xyd | text file | <1 MB | |||
Gas chromatography | .gcd .d | ~2 MB | proprietary software | .txt | text file | <1 MB | ✔ | ||
HPLC | .xls | <1 MB | proprietary software | .csv | comma-separated values | <1 MB | ✔ | ||
Cyclic voltammetry | .nox .pssession | ~8 MB | proprietary software | .txt | text file | <1 MB | ✔ | ✔ | |
EPR spectroscopy | .spe | <1 MB | proprietary software | .txt | text file | <1 MB | ✔ | ✔ | |
Differential scanning calorimetry | .ngb-dsu .ngb-taa | <1 MB | proprietary software | .csv | comma-separated values | <1 MB | ✔ | ✔ | |
Physisorption | .smp | <1 MB | proprietary software | .csv | comma-separated values | <1 MB | ✔ | ✔ | |
Isothermal titration calorimetry (ITC) | |||||||||
Dynamic light scattering (DLS) | .apkw .xlsx | <1 MB | proprietary software | .csv | comma-separated values | <1 MB | ✔ | ||
Atomic force microscopy (AFM) | ✔ | ||||||||
Transmission electron microscopy (TEM) | |||||||||
Transmission electron microscopy (TEM) | proprietary software | .jpg .tif | Image | <10 MB | ✔ |