Exploiting protein structure data to explore the evolution of protein function and biological complexity

Russell L Marsden, Juan A G Ranea, Antonio Sillero, Oliver Redfern, Corin Yeats, Michael Maibaum, David Lee, Sarah Addou, Gabrielle A Reeves, Timothy J Dallman, Christine A Orengo

Research output: Contribution to journalReview articlepeer-review

Abstract

New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.

Original languageEnglish
Pages (from-to)425-40
Number of pages16
JournalPhilosophical transactions of the Royal Society of London. Series B, Biological sciences
Volume361
Issue number1467
DOIs
Publication statusPublished - 29 Mar 2006

Keywords

  • Algorithms
  • Computational Biology
  • Databases, Factual
  • Evolution, Molecular
  • Protein Conformation
  • Proteins/chemistry

Fingerprint

Dive into the research topics of 'Exploiting protein structure data to explore the evolution of protein function and biological complexity'. Together they form a unique fingerprint.

Cite this