The effect of measurement error on clustering

Paulina Pankowska*, Daniel Oberski, Mauricio Garnier-Villarreal, Dimitris Pavlopoulos

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Clustering is a set of statistical techniques widely applied in the social sciences. While an important and useful tool, traditional clustering techniques tend to assume that the data are free from measurement error, which is often an unrealistic assumption. In this paper, we perform a Monte Carlo study to investigate the sensitivity of different clustering techniques to measurement error. We focus on three commonly used approaches: latent profile analysis (LPA), hierarchical clustering using Ward’s method, and k-means. We examine how the error affects the interpretability of the clusters and the classification of observations into clusters. Our results indicate that LPA fares better in the presence of error. In fact, clustering results from LPA can still be trusted when there is random error affecting one variable. K-means and Ward’s method, on the other hand, appear to already’break down’ when random error affects one variable and lead to inaccurate classifications. When the error is systematic and/or it affects more variables, all clustering methods produce severely biased results.

Original languageEnglish
JournalQuality and Quantity
DOIs
Publication statusE-pub ahead of print - 29 May 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025.

Keywords

  • Clustering
  • k-means
  • Latent profile analysis (LPA)
  • Measurement error
  • Ward’s method

Fingerprint

Dive into the research topics of 'The effect of measurement error on clustering'. Together they form a unique fingerprint.

Cite this