A robust unsupervised method for outlier set detection

  • Amal Sarfraz*
  • , Abigail Birnbaum
  • , Flannery Dolan
  • , Jonathan Lamontagne
  • , Lyudmila Mihaylova
  • , Charles Rougé
  • *Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

This paper proposes a robust method that identifies sets of points that collectively deviate from typical patterns in a dataset, which it calls “outlier sets”, while excluding individual points from detection. This new methodology, Outlier Set Two-step Identification (OSTI) employs a two-step approach to detect and label these outlier sets. First, it uses Gaussian Mixture Models for probabilistic clustering, identifying candidate outlier sets based on cluster weights below a hyperparameter threshold. Second, OSTI measures the Inter-cluster Mahalanobis distance between each candidate outlier set's centroid and the overall dataset mean. OSTI then tests the null hypothesis that this distance does not significantly differ from its theoretical chi-square distribution, enabling the formal detection of outlier sets. We test OSTI systematically on 8000 synthetic 2D datasets across various inlier configurations and thousands of possible outlier set characteristics. Results show OSTI robustly and consistently detects outlier sets with an average F1 score of 0.92 and an average purity (the degree to which outlier sets identified correspond to those generated synthetically, i.e., our ground truth) of 98.58 %. We also compare OSTI with state-of-the-art outlier detection methods, to illuminate how OSTI fills a gap as a tool for the exclusive detection of outlier sets.

Original languageEnglish
Article number114274
Number of pages18
JournalKnowledge-Based Systems
Volume329
Early online date15 Aug 2025
DOIs
Publication statusPublished - 4 Nov 2025

Bibliographical note

Publisher Copyright:
© 2025 Elsevier B.V.

Keywords

  • Gaussian mixture models
  • Inter-cluster Mahalanobis distance
  • Outlier set two-step identification (OSTI)
  • Outlier sets

Fingerprint

Dive into the research topics of 'A robust unsupervised method for outlier set detection'. Together they form a unique fingerprint.

Cite this