Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis

D. L. Oberski*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Latent class analysis (LCA) for categorical data is a model-based clustering and classification technique applied in a wide range of fields including the social sciences, machine learning, psychiatry, public health, and epidemiology. Its central assumption is conditional independence of the indicators given the latent class, i.e. “local independence”; violations can appear as model misfit, often leading LCA practitioners to increase the number of classes. However, when not all of the local dependence is of substantive scientific interest this leads to two options, that are both problematic: modeling uninterpretable classes, or retaining a lower number of substantive classes but incurring bias in the final results and classifications of interest due to remaining assumption violations. This paper suggests an alternative procedure, applicable in cases when the number of substantive classes is known in advance, or when substantive interest is otherwise well-defined. I suggest, in such cases, to model substantive local dependencies as additional discrete latent variables, while absorbing nuisance dependencies in additional parameters. An example application to the estimation of misclassification and turnover rates of the decision to vote in elections of 9510 Dutch residents demonstrates the advantages of this procedure relative to increasing the number of classes.

Original languageEnglish
Pages (from-to)171-182
Number of pages12
JournalAdvances in Data Analysis and Classification
Issue number2
Publication statusPublished - 2016
Externally publishedYes


  • Bivariate residual
  • Information criteria
  • Latent class analysis
  • Local dependence
  • Score test
  • Vote misclassification


Dive into the research topics of 'Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis'. Together they form a unique fingerprint.

Cite this