Abstract
IHMCIF (github.com/ihmwg/IHMCIF) is a data information framework that supports archiving and disseminating macromolecular structures determined by integrative or hybrid modeling (IHM), and making them Findable, Accessible, Interoperable, and Reusable (FAIR). IHMCIF is an extension of the Protein Data Bank Exchange/macromolecular Crystallographic Information Framework (PDBx/mmCIF) that serves as the framework for the Protein Data Bank (PDB) to archive experimentally determined atomic structures of biological macromolecules and their complexes with one another and small molecule ligands (e.g., enzyme cofactors and drugs). IHMCIF serves as the foundational data standard for the PDB-Dev prototype system, developed for archiving and disseminating integrative structures. It utilizes a flexible data representation to describe integrative structures that span multiple spatiotemporal scales and structural states with definitions for restraints from a variety of experimental methods contributing to integrative structural biology. The IHMCIF extension was created with the benefit of considerable community input and recommendations gathered by the Worldwide Protein Data Bank (wwPDB) Task Force for Integrative or Hybrid Methods (wwpdb.org/task/hybrid). Herein, we describe the development of IHMCIF to support evolving methodologies and ongoing advancements in integrative structural biology. Ultimately, IHMCIF will facilitate the unification of PDB-Dev data and tools with the PDB archive so that integrative structures can be archived and disseminated through PDB.
Original language | English |
---|---|
Article number | 168546 |
Journal | Journal of Molecular Biology |
Volume | 436 |
Issue number | 17 |
Early online date | 18 Mar 2024 |
DOIs | |
Publication status | Published - 1 Sept 2024 |
Bibliographical note
Publisher Copyright:© 2024 The Author(s)
Funding
B. Vallat acknowledges funding from the United States National Science Foundation (NSF) awards DBI-2112966 (PI: B. Vallat) and DBI-1756248 (PI: B. Vallat). H. Berman acknowledges funding from NSF (DBI-1519158). A. Sali acknowledges funding from NSF and the United States National Institutes of Health (NIH) (NSF DBI-2112967, PI: A. Sali; NSF DBI-1756250, PI: A. Sali; NIH R01GM083960, PI: A. Sali; NIH P41GM109824, PI: M.P. Rout). C. Kesselman acknowledges funding from NSF (DBI-2112968). RCSB PDB core operations are jointly funded by NSF (DBI-1832184, PI: S.K. Burley), the US Department of Energy (DE-SC0019749, PI: S.K. Burley), and the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, and the National Institute of General Medical Sciences of the NIH (R01GM133198, PI: S.K. Burley). Other funding awards to RCSB PDB by NSF and to PDBe by the UK Biotechnology and Biological Research Council are jointly supporting development of a Next Generation PDB archive (DBI-2019297, PI: S.K. Burley; BB/V004247/1, PI: Sameer Velankar) and new Mol* features (DBI-2129634, PI: S.K. Burley; BB/W017970/1, PI: Sameer Velankar). G. Hummer acknowledges support by the Max Planck Society. E. Tajkhorshid acknowledges funding from NIH (P41-GM104601, PI: Tajkhorshid; R24-GM145965, PI: Tajkhorshid). PDBj is supported by grants from the Database Integration Coordination Program from the department of NBDC program, Japan Science and Technology Agency (JPMJND2205, PI: G. Kurisu), and partially supported by Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number 22ama121001. PDBe is supported by European Molecular Biology Laboratory-European Bioinformatics Institute. T. Schwede acknowledges funding from SIB Swiss Institute of Bioinformatics. J. Hoch acknowledges funding from NIH (R24GM150793) for BMRB. J. Meiler is supported by a Humboldt Professorship of the Alexander von Humboldt Foundation. J. Meiler acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) through SFB1423 (421152132), SFB 1052 (209933838), and SPP 2363 (460865652). J. Meiler is supported by BMBF (Federal Ministry of Education and Research) through the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) and through DAAD project 57616814 (SECAI, School of Embedded Composite AI). Work in the Meiler laboratory is further supported through the NIH (R01 HL122010, R01 DA046138, U01 AI150739, R01CA227833, S10 OD016216, S10 OD020154, S10 OD032234). The Wellcome Centre for Cell Biology is supported by core funding from the Wellcome Trust (203149). T. Ferrin and T. Goddard acknowledge support from the NIH (R01GM129325, PI: T.E. Ferrin). A.M.J.J. Bonvin acknowledges support from the European Union Horizon 2020, projects BioExcel (675728, 823830 and 101093290) and EGI-ACE (101017567), from the Netherlands e-Science Center (027.020.G13) and from the Dutch Foundation for Scientific Research (NWO) (TOP-PUNT grant 718.015.001). C.A.M Seidel acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) CRC 1208 (ID 267205415, project A08) and SE 1195/17-1 as well as the European Research Council through the Advanced Grant 2014 hybridFRET (671208). EMDB is supported by funding from the Wellcome Trust [212977/Z/18/Z]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. B. Vallat acknowledges funding from the United States National Science Foundation (NSF) awards DBI-2112966 (PI: B. Vallat) and DBI-1756248 (PI: B. Vallat). H. Berman acknowledges funding from NSF (DBI-1519158). A. Sali acknowledges funding from NSF and the United States National Institutes of Health (NIH) (NSF DBI-2112967, PI: A. Sali; NSF DBI-1756250, PI: A. Sali; NIH R01GM083960, PI: A. Sali; NIH P41GM109824, PI: M.P. Rout). C. Kesselman acknowledges funding from NSF (DBI-2112968). RCSB PDB core operations are jointly funded by NSF (DBI-1832184, PI: S.K. Burley), the US Department of Energy (DE-SC0019749, PI: S.K. Burley), and the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, and the National Institute of General Medical Sciences of the NIH (R01GM133198, PI: S.K. Burley). Other funding awards to RCSB PDB by NSF and to PDBe by the UK Biotechnology and Biological Research Council are jointly supporting development of a Next Generation PDB archive (DBI-2019297, PI: S.K. Burley; BB/V004247/1, PI: Sameer Velankar) and new Mol* features (DBI-2129634, PI: S.K. Burley; BB/W017970/1, PI: Sameer Velankar). G. Hummer acknowledges support by the Max Planck Society. E. Tajkhorshid acknowledges funding from NIH (P41-GM104601, PI: Tajkhorshid; R24-GM145965, PI: Tajkhorshid). PDBj is supported by grants from the Database Integration Coordination Program from the department of NBDC program, Japan Science and Technology Agency (JPMJND2205, PI: G. Kurisu), and partially supported by Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number 22ama121001. PDBe is supported by European Molecular Biology Laboratory-European Bioinformatics Institute. T. Schwede acknowledges funding from SIB Swiss Institute of Bioinformatics. J. Hoch acknowledges funding from NIH (R24GM150793) for BMRB. J. Meiler is supported by a Humboldt Professorship of the Alexander von Humboldt Foundation. J. Meiler acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) through SFB1423 (421152132), SFB 1052 (209933838), and SPP 2363 (460865652). J. Meiler is supported by BMBF (Federal Ministry of Education and Research, Germany) through the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) and through DAAD project 57616814 (SECAI, School of Embedded Composite AI). Work in the Meiler laboratory is further supported through the NIH (R01 HL122010, R01 DA046138, U01 AI150739, R01CA227833, S10 OD016216, S10 OD020154, S10 OD032234). The Wellcome Centre for Cell Biology is supported by core funding from the Wellcome Trust (203149). T. Ferrin and T. Goddard acknowledge support from the NIH (R01GM129325, PI: T.E. Ferrin). A.M.J.J. Bonvin acknowledges support from the European Union Horizon 2020, projects BioExcel (675728, 823830 and 101093290) and EGI-ACE (101017567), from the Netherlands e-Science Center (027.020.G13) and from the Dutch Foundation for Scientific Research (NWO) (TOP-PUNT grant 718.015.001). C.A.M Seidel acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) CRC 1208 (ID 267205415, project A08) and SE 1195/17-1 as well as the European Research Council through the Advanced Grant 2014 hybridFRET (671208). EMDB is supported by funding from the Wellcome Trust [212977/Z/18/Z]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.The authors thank all members of the wwPDB IHM Task Force and Working Groups for their continued support and recommendations. We thank all the researchers worldwide who have deposited structures to PDB and PDB-Dev. We also gratefully acknowledge contributions to the PDBx/mmCIF data standard made by past members of the Worldwide Protein Data Bank partner organizations (Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), Protein Data Bank in Europe (PDBe), Protein Data Bank Japan (PDBj), EMDB, and BMRB) and members of the structural biology community. We thank the developers of Molstar for providing support for visualizing integrative structures.
Funders | Funder number |
---|---|
Bundesministerium für Bildung und Forschung | |
EMDB | |
National Institute of Allergy and Infectious Diseases | |
National Cancer Institute | |
BMRB | |
Center for Scalable Data Analytics and Artificial Intelligence | |
department of NBDC | |
European Molecular Biology Laboratory-European Bioinformatics Institute | |
Alexander von Humboldt-Stiftung | |
Foundation for Dietary Scientific Research | |
U.S. Department of Energy | DE-SC0019749 |
U.S. Department of Energy | |
EGI-ACE | 101017567, G13, 027.020 |
Wellcome Trust | R01GM129325, 203149 |
Wellcome Trust | |
National Institute of General Medical Sciences | R01GM133198 |
National Institute of General Medical Sciences | |
Horizon 2020 | 823830, 101093290, 675728 |
Horizon 2020 | |
Japan Agency for Medical Research and Development | 22ama121001 |
Japan Agency for Medical Research and Development | |
Nederlandse Organisatie voor Wetenschappelijk Onderzoek | 718.015.001 |
Nederlandse Organisatie voor Wetenschappelijk Onderzoek | |
European Research Council | 212977/Z/18/Z, 671208 |
European Research Council | |
Max-Planck-Gesellschaft | P41-GM104601, R24-GM145965 |
Max-Planck-Gesellschaft | |
Deutsche Forschungsgemeinschaft | SE 1195/17-1, SFB 1052, SFB1423, 421152132, 267205415, 460865652, SPP 2363, A08, CRC 1208, 209933838 |
Deutsche Forschungsgemeinschaft | |
German Academic Exchange Service | S10 OD032234, R01CA227833, S10 OD020154, 57616814, R01 DA046138, R01 HL122010, U01 AI150739, S10 OD016216 |
German Academic Exchange Service | |
Biotechnology and Biological Sciences Research Council | BB/W017970/1, DBI-2129634, DBI-2019297, BB/V004247/1 |
Biotechnology and Biological Sciences Research Council | |
National Institutes of Health | DBI-1756250, P41GM109824, NSF DBI-2112967, R01GM083960, DBI-1832184 |
National Institutes of Health | |
National Science Foundation | DBI-1519158, DBI-2112966, DBI-1756248 |
National Science Foundation | |
Japan Science and Technology Agency | JPMJND2205 |
Japan Science and Technology Agency | |
Swiss Institute of Bioinformatics | R24GM150793 |
Swiss Institute of Bioinformatics |
Keywords
- Data Standard
- IHMCIF
- PDB-Dev
- PDBx/mmCIF
- Worldwide Protein Data Bank