Skip to main navigation Skip to search Skip to main content

PLM-eXplain: Divide and Conquer the Protein Embedding Space

Research output: Working paperPreprintAcademic

Abstract

Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.
Original languageEnglish
PublisherarXiv
Pages1-13
Number of pages13
DOIs
Publication statusPublished - 9 Apr 2025

Keywords

  • q-bio.BM
  • cs.AI
  • cs.LG
  • Protein language model
  • Explainable AI
  • Embeddings
  • Protein Property Prediction
  • Protein Aggregation

Fingerprint

Dive into the research topics of 'PLM-eXplain: Divide and Conquer the Protein Embedding Space'. Together they form a unique fingerprint.

Cite this