Abstract
Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.
| Original language | English |
|---|---|
| Publisher | arXiv |
| Pages | 1-13 |
| Number of pages | 13 |
| DOIs | |
| Publication status | Published - 9 Apr 2025 |
Keywords
- q-bio.BM
- cs.AI
- cs.LG
- Protein language model
- Explainable AI
- Embeddings
- Protein Property Prediction
- Protein Aggregation
Fingerprint
Dive into the research topics of 'PLM-eXplain: Divide and Conquer the Protein Embedding Space'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver