TY - JOUR
T1 - Comparative genomic analysis of thermophilic fungi reveals convergent evolutionary adaptations and gene losses
AU - Steindorff, Andrei S
AU - Aguilar-Pontes, Maria Victoria
AU - Robinson, Aaron J
AU - Andreopoulos, Bill
AU - LaButti, Kurt
AU - Kuo, Alan
AU - Mondo, Stephen
AU - Riley, Robert
AU - Otillar, Robert
AU - Haridas, Sajeet
AU - Lipzen, Anna
AU - Grimwood, Jane
AU - Schmutz, Jeremy
AU - Clum, Alicia
AU - Reid, Ian D
AU - Moisan, Marie-Claude
AU - Butler, Gregory
AU - Nguyen, Thi Truc Minh
AU - Dewar, Ken
AU - Conant, Gavin
AU - Drula, Elodie
AU - Henrissat, Bernard
AU - Hansel, Colleen
AU - Singer, Steven
AU - Hutchinson, Miriam I
AU - de Vries, Ronald P
AU - Natvig, Donald O
AU - Powell, Amy J
AU - Tsang, Adrian
AU - Grigoriev, Igor V
N1 - Publisher Copyright:
© Lawrence Berkeley National Laboratory and the Authors 2024.
PY - 2024/9/12
Y1 - 2024/9/12
N2 - Thermophily is a trait scattered across the fungal tree of life, with its highest prevalence within three fungal families (Chaetomiaceae, Thermoascaceae, and Trichocomaceae), as well as some members of the phylum Mucoromycota. We examined 37 thermophilic and thermotolerant species and 42 mesophilic species for this study and identified thermophily as the ancestral state of all three prominent families of thermophilic fungi. Thermophilic fungal genomes were found to encode various thermostable enzymes, including carbohydrate-active enzymes such as endoxylanases, which are useful for many industrial applications. At the same time, the overall gene counts, especially in gene families responsible for microbial defense such as secondary metabolism, are reduced in thermophiles compared to mesophiles. We also found a reduction in the core genome size of thermophiles in both the Chaetomiaceae family and the Eurotiomycetes class. The Gene Ontology terms lost in thermophilic fungi include primary metabolism, transporters, UV response, and O-methyltransferases. Comparative genomics analysis also revealed higher GC content in the third base of codons (GC3) and a lower effective number of codons in fungal thermophiles than in both thermotolerant and mesophilic fungi. Furthermore, using the Support Vector Machine classifier, we identified several Pfam domains capable of discriminating between genomes of thermophiles and mesophiles with 94% accuracy. Using AlphaFold2 to predict protein structures of endoxylanases (GH10), we built a similarity network based on the structures. We found that the number of disulfide bonds appears important for protein structure, and the network clusters based on protein structures correlate with the optimal activity temperature. Thus, comparative genomics offers new insights into the biology, adaptation, and evolutionary history of thermophilic fungi while providing a parts list for bioengineering applications.
AB - Thermophily is a trait scattered across the fungal tree of life, with its highest prevalence within three fungal families (Chaetomiaceae, Thermoascaceae, and Trichocomaceae), as well as some members of the phylum Mucoromycota. We examined 37 thermophilic and thermotolerant species and 42 mesophilic species for this study and identified thermophily as the ancestral state of all three prominent families of thermophilic fungi. Thermophilic fungal genomes were found to encode various thermostable enzymes, including carbohydrate-active enzymes such as endoxylanases, which are useful for many industrial applications. At the same time, the overall gene counts, especially in gene families responsible for microbial defense such as secondary metabolism, are reduced in thermophiles compared to mesophiles. We also found a reduction in the core genome size of thermophiles in both the Chaetomiaceae family and the Eurotiomycetes class. The Gene Ontology terms lost in thermophilic fungi include primary metabolism, transporters, UV response, and O-methyltransferases. Comparative genomics analysis also revealed higher GC content in the third base of codons (GC3) and a lower effective number of codons in fungal thermophiles than in both thermotolerant and mesophilic fungi. Furthermore, using the Support Vector Machine classifier, we identified several Pfam domains capable of discriminating between genomes of thermophiles and mesophiles with 94% accuracy. Using AlphaFold2 to predict protein structures of endoxylanases (GH10), we built a similarity network based on the structures. We found that the number of disulfide bonds appears important for protein structure, and the network clusters based on protein structures correlate with the optimal activity temperature. Thus, comparative genomics offers new insights into the biology, adaptation, and evolutionary history of thermophilic fungi while providing a parts list for bioengineering applications.
KW - Adaptation, Physiological/genetics
KW - Evolution, Molecular
KW - Fungal Proteins/genetics
KW - Fungi/genetics
KW - Genome, Fungal
KW - Genomics/methods
KW - Phylogeny
UR - http://www.scopus.com/inward/record.url?scp=85203692063&partnerID=8YFLogxK
U2 - 10.1038/s42003-024-06681-w
DO - 10.1038/s42003-024-06681-w
M3 - Article
C2 - 39266695
SN - 2399-3642
VL - 7
JO - Communications Biology
JF - Communications Biology
IS - 1
M1 - 1124
ER -