On Support Samples of Next Word Prediction

  • Yuqian Li
  • , Yupei Du
  • , Yufang Liu*
  • , Feifei Feng
  • , Mou Xiao Feng
  • , Yuanbin Wu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Language models excel in various tasks by making complex decisions, yet understanding the rationale behind these decisions remains a challenge. This paper investigates data-centric interpretability in language models, focusing on the next-word prediction task. Using representer theorem, we identify two types of support samples-those that either promote or deter specific predictions. Our findings reveal that being a support sample is an intrinsic property, predictable even before training begins. Additionally, while non-support samples are less influential in direct predictions, they play a critical role in preventing overfitting and shaping generalization and representation learning. Notably, the importance of non-support samples increases in deeper layers, suggesting their significant role in intermediate representation formation. These insights shed light on the interplay between data and model decisions, offering a new dimension to understanding language model behavior and interpretability.

Original languageEnglish
Title of host publicationProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics
Subtitle of host publication(Volume 1: Long Papers)
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages10277-10289
Number of pages13
ISBN (Electronic)9798891762510
DOIs
Publication statusPublished - 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

Bibliographical note

Publisher Copyright:
© 2025 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'On Support Samples of Next Word Prediction'. Together they form a unique fingerprint.

Cite this