TY - JOUR
T1 - The effect of measurement error on clustering
AU - Pankowska, Paulina
AU - Oberski, Daniel
AU - Garnier-Villarreal, Mauricio
AU - Pavlopoulos, Dimitris
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/5/29
Y1 - 2025/5/29
N2 - Clustering is a set of statistical techniques widely applied in the social sciences. While an important and useful tool, traditional clustering techniques tend to assume that the data are free from measurement error, which is often an unrealistic assumption. In this paper, we perform a Monte Carlo study to investigate the sensitivity of different clustering techniques to measurement error. We focus on three commonly used approaches: latent profile analysis (LPA), hierarchical clustering using Ward’s method, and k-means. We examine how the error affects the interpretability of the clusters and the classification of observations into clusters. Our results indicate that LPA fares better in the presence of error. In fact, clustering results from LPA can still be trusted when there is random error affecting one variable. K-means and Ward’s method, on the other hand, appear to already’break down’ when random error affects one variable and lead to inaccurate classifications. When the error is systematic and/or it affects more variables, all clustering methods produce severely biased results.
AB - Clustering is a set of statistical techniques widely applied in the social sciences. While an important and useful tool, traditional clustering techniques tend to assume that the data are free from measurement error, which is often an unrealistic assumption. In this paper, we perform a Monte Carlo study to investigate the sensitivity of different clustering techniques to measurement error. We focus on three commonly used approaches: latent profile analysis (LPA), hierarchical clustering using Ward’s method, and k-means. We examine how the error affects the interpretability of the clusters and the classification of observations into clusters. Our results indicate that LPA fares better in the presence of error. In fact, clustering results from LPA can still be trusted when there is random error affecting one variable. K-means and Ward’s method, on the other hand, appear to already’break down’ when random error affects one variable and lead to inaccurate classifications. When the error is systematic and/or it affects more variables, all clustering methods produce severely biased results.
KW - Clustering
KW - k-means
KW - Latent profile analysis (LPA)
KW - Measurement error
KW - Ward’s method
UR - http://www.scopus.com/inward/record.url?scp=105006826572&partnerID=8YFLogxK
U2 - 10.1007/s11135-025-02177-9
DO - 10.1007/s11135-025-02177-9
M3 - Article
AN - SCOPUS:105006826572
SN - 0033-5177
JO - Quality and Quantity
JF - Quality and Quantity
ER -