TY - JOUR
T1 - Predicting Structural Motifs of Glycosaminoglycans using Cryogenic Infrared Spectroscopy and Random Forest
AU - Riedel, Jerome
AU - Lettow, Maike
AU - Grabarics, Márkó
AU - Götze, Michael
AU - Miller, Rebecca L.
AU - Boons, Geert Jan
AU - Meijer, Gerard
AU - von Helden, Gert
AU - Szekeres, Gergo Peter
AU - Pagel, Kevin
N1 - Funding Information:
Financial support for this research was provided by the European Union’s Horizon 2020 Research and Innovation Programme grant number 899687-HS-SEQ. The authors would like to thank the HPC Service of ZEDAT, Freie Universität Berlin, for computing time.
Funding Information:
Open access funded by Max Planck Society.
Publisher Copyright:
© 2023 The Authors. Published by American Chemical Society.
PY - 2023/4/12
Y1 - 2023/4/12
N2 - In recent years, glycosaminoglycans (GAGs) have emerged into the focus of biochemical and biomedical research due to their importance in a variety of physiological processes. These molecules show great diversity, which makes their analysis highly challenging. A promising tool for identifying the structural motifs and conformation of shorter GAG chains is cryogenic gas-phase infrared (IR) spectroscopy. In this work, the cryogenic gas-phase IR spectra of mass-selected heparan sulfate (HS) di-, tetra-, and hexasaccharide ions were recorded to extract vibrational features that are characteristic to structural motifs. The data were augmented with chondroitin sulfate (CS) disaccharide spectra to assemble a training library for random forest (RF) classifiers. These were used to discriminate between GAG classes (CS or HS) and different sulfate positions (2-O-, 4-O-, 6-O-, and N-sulfation). With optimized data preprocessing and RF modeling, a prediction accuracy of >97% was achieved for HS tetra- and hexasaccharides based on a training set of only 21 spectra. These results exemplify the importance of combining gas-phase cryogenic IR ion spectroscopy with machine learning to improve the future analytical workflow for GAG sequencing and that of other biomolecules, such as metabolites.
AB - In recent years, glycosaminoglycans (GAGs) have emerged into the focus of biochemical and biomedical research due to their importance in a variety of physiological processes. These molecules show great diversity, which makes their analysis highly challenging. A promising tool for identifying the structural motifs and conformation of shorter GAG chains is cryogenic gas-phase infrared (IR) spectroscopy. In this work, the cryogenic gas-phase IR spectra of mass-selected heparan sulfate (HS) di-, tetra-, and hexasaccharide ions were recorded to extract vibrational features that are characteristic to structural motifs. The data were augmented with chondroitin sulfate (CS) disaccharide spectra to assemble a training library for random forest (RF) classifiers. These were used to discriminate between GAG classes (CS or HS) and different sulfate positions (2-O-, 4-O-, 6-O-, and N-sulfation). With optimized data preprocessing and RF modeling, a prediction accuracy of >97% was achieved for HS tetra- and hexasaccharides based on a training set of only 21 spectra. These results exemplify the importance of combining gas-phase cryogenic IR ion spectroscopy with machine learning to improve the future analytical workflow for GAG sequencing and that of other biomolecules, such as metabolites.
UR - http://www.scopus.com/inward/record.url?scp=85151828266&partnerID=8YFLogxK
U2 - 10.1021/jacs.2c12762
DO - 10.1021/jacs.2c12762
M3 - Article
C2 - 37000483
AN - SCOPUS:85151828266
SN - 0002-7863
VL - 145
SP - 7859
EP - 7868
JO - Journal of the American Chemical Society
JF - Journal of the American Chemical Society
IS - 14
ER -