IHMCIF: An Extension of the PDBx/mmCIF Data Standard for Integrative Structure Determination Methods: IHMCIF Data Standard for Integrative Structures

Brinda Vallat*, Benjamin M. Webb, John D. Westbrook, Thomas D. Goddard, Christian A. Hanke, Andrea Graziadei, Ezra Peisach, Arthur Zalevsky, Jared Sagendorf, Hongsuda Tangmunarunkit, Serban Voinea, Monica Sekharan, Jian Yu, Alexander A.M.J.J. Bonvin, Frank DiMaio, Gerhard Hummer, Jens Meiler, Emad Tajkhorshid, Thomas E. Ferrin, Catherine L. LawsonAlexander Leitner, Juri Rappsilber, Claus A.M. Seidel, Cy M. Jeffries, Stephen K. Burley, Jeffrey C. Hoch, Genji Kurisu, Kyle Morris, Ardan Patwardhan, Sameer Velankar, Torsten Schwede, Jill Trewhella, Carl Kesselman, Helen M. Berman, Andrej Sali

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

IHMCIF (github.com/ihmwg/IHMCIF) is a data information framework that supports archiving and disseminating macromolecular structures determined by integrative or hybrid modeling (IHM), and making them Findable, Accessible, Interoperable, and Reusable (FAIR). IHMCIF is an extension of the Protein Data Bank Exchange/macromolecular Crystallographic Information Framework (PDBx/mmCIF) that serves as the framework for the Protein Data Bank (PDB) to archive experimentally determined atomic structures of biological macromolecules and their complexes with one another and small molecule ligands (e.g., enzyme cofactors and drugs). IHMCIF serves as the foundational data standard for the PDB-Dev prototype system, developed for archiving and disseminating integrative structures. It utilizes a flexible data representation to describe integrative structures that span multiple spatiotemporal scales and structural states with definitions for restraints from a variety of experimental methods contributing to integrative structural biology. The IHMCIF extension was created with the benefit of considerable community input and recommendations gathered by the Worldwide Protein Data Bank (wwPDB) Task Force for Integrative or Hybrid Methods (wwpdb.org/task/hybrid). Herein, we describe the development of IHMCIF to support evolving methodologies and ongoing advancements in integrative structural biology. Ultimately, IHMCIF will facilitate the unification of PDB-Dev data and tools with the PDB archive so that integrative structures can be archived and disseminated through PDB.

Original languageEnglish
Article number168546
JournalJournal of Molecular Biology
Volume436
Issue number17
Early online date18 Mar 2024
DOIs
Publication statusPublished - 1 Sept 2024

Bibliographical note

Publisher Copyright:
© 2024 The Author(s)

Funding

B. Vallat acknowledges funding from the United States National Science Foundation (NSF) awards DBI-2112966 (PI: B. Vallat) and DBI-1756248 (PI: B. Vallat). H. Berman acknowledges funding from NSF (DBI-1519158). A. Sali acknowledges funding from NSF and the United States National Institutes of Health (NIH) (NSF DBI-2112967, PI: A. Sali; NSF DBI-1756250, PI: A. Sali; NIH R01GM083960, PI: A. Sali; NIH P41GM109824, PI: M.P. Rout). C. Kesselman acknowledges funding from NSF (DBI-2112968). RCSB PDB core operations are jointly funded by NSF (DBI-1832184, PI: S.K. Burley), the US Department of Energy (DE-SC0019749, PI: S.K. Burley), and the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, and the National Institute of General Medical Sciences of the NIH (R01GM133198, PI: S.K. Burley). Other funding awards to RCSB PDB by NSF and to PDBe by the UK Biotechnology and Biological Research Council are jointly supporting development of a Next Generation PDB archive (DBI-2019297, PI: S.K. Burley; BB/V004247/1, PI: Sameer Velankar) and new Mol* features (DBI-2129634, PI: S.K. Burley; BB/W017970/1, PI: Sameer Velankar). G. Hummer acknowledges support by the Max Planck Society. E. Tajkhorshid acknowledges funding from NIH (P41-GM104601, PI: Tajkhorshid; R24-GM145965, PI: Tajkhorshid). PDBj is supported by grants from the Database Integration Coordination Program from the department of NBDC program, Japan Science and Technology Agency (JPMJND2205, PI: G. Kurisu), and partially supported by Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number 22ama121001. PDBe is supported by European Molecular Biology Laboratory-European Bioinformatics Institute. T. Schwede acknowledges funding from SIB Swiss Institute of Bioinformatics. J. Hoch acknowledges funding from NIH (R24GM150793) for BMRB. J. Meiler is supported by a Humboldt Professorship of the Alexander von Humboldt Foundation. J. Meiler acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) through SFB1423 (421152132), SFB 1052 (209933838), and SPP 2363 (460865652). J. Meiler is supported by BMBF (Federal Ministry of Education and Research) through the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) and through DAAD project 57616814 (SECAI, School of Embedded Composite AI). Work in the Meiler laboratory is further supported through the NIH (R01 HL122010, R01 DA046138, U01 AI150739, R01CA227833, S10 OD016216, S10 OD020154, S10 OD032234). The Wellcome Centre for Cell Biology is supported by core funding from the Wellcome Trust (203149). T. Ferrin and T. Goddard acknowledge support from the NIH (R01GM129325, PI: T.E. Ferrin). A.M.J.J. Bonvin acknowledges support from the European Union Horizon 2020, projects BioExcel (675728, 823830 and 101093290) and EGI-ACE (101017567), from the Netherlands e-Science Center (027.020.G13) and from the Dutch Foundation for Scientific Research (NWO) (TOP-PUNT grant 718.015.001). C.A.M Seidel acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) CRC 1208 (ID 267205415, project A08) and SE 1195/17-1 as well as the European Research Council through the Advanced Grant 2014 hybridFRET (671208). EMDB is supported by funding from the Wellcome Trust [212977/Z/18/Z]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. B. Vallat acknowledges funding from the United States National Science Foundation (NSF) awards DBI-2112966 (PI: B. Vallat) and DBI-1756248 (PI: B. Vallat). H. Berman acknowledges funding from NSF (DBI-1519158). A. Sali acknowledges funding from NSF and the United States National Institutes of Health (NIH) (NSF DBI-2112967, PI: A. Sali; NSF DBI-1756250, PI: A. Sali; NIH R01GM083960, PI: A. Sali; NIH P41GM109824, PI: M.P. Rout). C. Kesselman acknowledges funding from NSF (DBI-2112968). RCSB PDB core operations are jointly funded by NSF (DBI-1832184, PI: S.K. Burley), the US Department of Energy (DE-SC0019749, PI: S.K. Burley), and the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, and the National Institute of General Medical Sciences of the NIH (R01GM133198, PI: S.K. Burley). Other funding awards to RCSB PDB by NSF and to PDBe by the UK Biotechnology and Biological Research Council are jointly supporting development of a Next Generation PDB archive (DBI-2019297, PI: S.K. Burley; BB/V004247/1, PI: Sameer Velankar) and new Mol* features (DBI-2129634, PI: S.K. Burley; BB/W017970/1, PI: Sameer Velankar). G. Hummer acknowledges support by the Max Planck Society. E. Tajkhorshid acknowledges funding from NIH (P41-GM104601, PI: Tajkhorshid; R24-GM145965, PI: Tajkhorshid). PDBj is supported by grants from the Database Integration Coordination Program from the department of NBDC program, Japan Science and Technology Agency (JPMJND2205, PI: G. Kurisu), and partially supported by Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number 22ama121001. PDBe is supported by European Molecular Biology Laboratory-European Bioinformatics Institute. T. Schwede acknowledges funding from SIB Swiss Institute of Bioinformatics. J. Hoch acknowledges funding from NIH (R24GM150793) for BMRB. J. Meiler is supported by a Humboldt Professorship of the Alexander von Humboldt Foundation. J. Meiler acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) through SFB1423 (421152132), SFB 1052 (209933838), and SPP 2363 (460865652). J. Meiler is supported by BMBF (Federal Ministry of Education and Research, Germany) through the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) and through DAAD project 57616814 (SECAI, School of Embedded Composite AI). Work in the Meiler laboratory is further supported through the NIH (R01 HL122010, R01 DA046138, U01 AI150739, R01CA227833, S10 OD016216, S10 OD020154, S10 OD032234). The Wellcome Centre for Cell Biology is supported by core funding from the Wellcome Trust (203149). T. Ferrin and T. Goddard acknowledge support from the NIH (R01GM129325, PI: T.E. Ferrin). A.M.J.J. Bonvin acknowledges support from the European Union Horizon 2020, projects BioExcel (675728, 823830 and 101093290) and EGI-ACE (101017567), from the Netherlands e-Science Center (027.020.G13) and from the Dutch Foundation for Scientific Research (NWO) (TOP-PUNT grant 718.015.001). C.A.M Seidel acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) CRC 1208 (ID 267205415, project A08) and SE 1195/17-1 as well as the European Research Council through the Advanced Grant 2014 hybridFRET (671208). EMDB is supported by funding from the Wellcome Trust [212977/Z/18/Z]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.The authors thank all members of the wwPDB IHM Task Force and Working Groups for their continued support and recommendations. We thank all the researchers worldwide who have deposited structures to PDB and PDB-Dev. We also gratefully acknowledge contributions to the PDBx/mmCIF data standard made by past members of the Worldwide Protein Data Bank partner organizations (Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), Protein Data Bank in Europe (PDBe), Protein Data Bank Japan (PDBj), EMDB, and BMRB) and members of the structural biology community. We thank the developers of Molstar for providing support for visualizing integrative structures.

FundersFunder number
Bundesministerium für Bildung und Forschung
EMDB
National Institute of Allergy and Infectious Diseases
National Cancer Institute
BMRB
Center for Scalable Data Analytics and Artificial Intelligence
department of NBDC
European Molecular Biology Laboratory-European Bioinformatics Institute
Alexander von Humboldt-Stiftung
Foundation for Dietary Scientific Research
U.S. Department of EnergyDE-SC0019749
U.S. Department of Energy
EGI-ACE101017567, G13, 027.020
Wellcome TrustR01GM129325, 203149
Wellcome Trust
National Institute of General Medical SciencesR01GM133198
National Institute of General Medical Sciences
Horizon 2020823830, 101093290, 675728
Horizon 2020
Japan Agency for Medical Research and Development22ama121001
Japan Agency for Medical Research and Development
Nederlandse Organisatie voor Wetenschappelijk Onderzoek718.015.001
Nederlandse Organisatie voor Wetenschappelijk Onderzoek
European Research Council212977/Z/18/Z, 671208
European Research Council
Max-Planck-GesellschaftP41-GM104601, R24-GM145965
Max-Planck-Gesellschaft
Deutsche ForschungsgemeinschaftSE 1195/17-1, SFB 1052, SFB1423, 421152132, 267205415, 460865652, SPP 2363, A08, CRC 1208, 209933838
Deutsche Forschungsgemeinschaft
German Academic Exchange ServiceS10 OD032234, R01CA227833, S10 OD020154, 57616814, R01 DA046138, R01 HL122010, U01 AI150739, S10 OD016216
German Academic Exchange Service
Biotechnology and Biological Sciences Research CouncilBB/W017970/1, DBI-2129634, DBI-2019297, BB/V004247/1
Biotechnology and Biological Sciences Research Council
National Institutes of HealthDBI-1756250, P41GM109824, NSF DBI-2112967, R01GM083960, DBI-1832184
National Institutes of Health
National Science FoundationDBI-1519158, DBI-2112966, DBI-1756248
National Science Foundation
Japan Science and Technology AgencyJPMJND2205
Japan Science and Technology Agency
Swiss Institute of BioinformaticsR24GM150793
Swiss Institute of Bioinformatics

    Keywords

    • Data Standard
    • IHMCIF
    • PDB-Dev
    • PDBx/mmCIF
    • Worldwide Protein Data Bank

    Fingerprint

    Dive into the research topics of 'IHMCIF: An Extension of the PDBx/mmCIF Data Standard for Integrative Structure Determination Methods: IHMCIF Data Standard for Integrative Structures'. Together they form a unique fingerprint.

    Cite this