MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations

Research output: Contribution to conferencePaperAcademic

Abstract

Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.
Original languageEnglish
Number of pages20
DOIs
Publication statusPublished - Jan 2023

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Funding

Y.J. was supported in part by a research assistantship funded by a National Science Foundation grant 1640834 to V.H. The work of V.H. was supported in part by the National Center for Advancing Translational Sciences, National Institutes of Health through the grants UL1 TR000127 and TR002014, the National Science Foundation, through the grant 1640834, the Pennsylvania State University’s Institute for Computational and Data Sciences and the Center for Artificial Intelligence Foundations and Scientific Applications, the Edward Frymoyer Endowed Professorship in Information Sciences and Technology, the Dorothy Foehr Huck and J. Lloyd Huck Chair in Biomedical Data Sciences and Artificial Intelligence at Pennsylvania State University, and the Sudha Murty Distinguished Visiting Chair in Neurocomputing and Data Science funded by the Pratiksha Trust at the Indian Institute of Science. This work was also supported in part by the European H2020 e-Infrastructure grant BioExcel (grant no. 675728 and 823830) (A.M.J.J.B.). Financial support from the Netherlands Organisation for Scientific Research through an Accelerating Scientific Discovery (ASDI) from the Netherlands eScience Center (grant no. 027016G04) (L.C.X. and A.M.J.J.B.) and a Veni grant (grant no. 722.014.005) (L.C.X.) are acknowledged.

FundersFunder number
Center for Artificial Intelligence Foundations and Scientific Applications
National Science Foundation1640834
National Institutes of HealthUL1 TR000127, TR002014
National Center for Advancing Translational Sciences
Pennsylvania State University
Horizon 2020 Framework Programme823830, 675728
Nederlandse Organisatie voor Wetenschappelijk Onderzoek027016G04, 722.014.005

    Keywords

    • machine learning
    • method combination
    • protein–protein docking
    • scoring functions

    Fingerprint

    Dive into the research topics of 'MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations'. Together they form a unique fingerprint.

    Cite this