Abstract
Large-scale surveys such as the Programme for International Student Assessment (PISA),
the Teaching and Learning International Survey (TALIS), and the Programme for the
International Assessment of Adult Competences (PIAAC) use advanced statistical models
to estimate scores of latent traits from multiple observed responses. The comparison of
such estimated scores across different groups of respondents is valid to the extent that the
same set of estimated parameters holds in each group surveyed. This issue of invariance of
parameter estimates is addressed in model fit indices which gauge the likelihood that one
set of parameters can be used across all groups. Therefore, the problem of scale invariance
across groups of respondents can typically be framed as the question of how well a single
model fits the responses of all groups. However, the procedures used to evaluate the fit of
these models pose a series of theoretical and practical problems. The most commonly
applied procedures to establish invariance of cognitive and non-cognitive scales across
countries in large-scale surveys are developed within the framework of confirmatory factor
analysis and item response theory. The criteria that are commonly applied to evaluate the
fit of such models, such as the decrement of the Comparative Fit Index in confirmatory
factor analysis, work normally well in the comparison of a small number of countries or
groups, but can perform poorly in large-scale surveys featuring a large number of countries.
More specifically, the common criteria often result in the non-rejection of metric
invariance; however, the step from metric invariance (i.e. identical factor loadings across
countries) to scalar invariance (i.e. identical intercepts, in addition to identical factor
loadings) appears to set overly restrictive standards for scalar invariance (i.e. identical
intercepts). This report sets out to identify and apply novel procedures to evaluate model
fit across a large number of groups, or novel scaling models that are more likely to pass
common model fit criteria.
Using both real and simulated data, the following procedures are described and applied:
multigroup confirmatory factor analysis, followed by alignment analysis of the same data
set; Bayesian approximate measurement invariance; Bayesian measurement invariance
testing in Item-Response Theory (IRT) models; and multigroup and multilevel latent class
analysis. These approaches have the potential to resolve recurrent fit problems in invariance
testing. Though promising, more work with these new approaches is needed to establish
their suitability in large-scale surveys. The last chapter reports the conclusions from a
conference in which these approaches were discussed, along with traditional approaches,
in order to provide
the Teaching and Learning International Survey (TALIS), and the Programme for the
International Assessment of Adult Competences (PIAAC) use advanced statistical models
to estimate scores of latent traits from multiple observed responses. The comparison of
such estimated scores across different groups of respondents is valid to the extent that the
same set of estimated parameters holds in each group surveyed. This issue of invariance of
parameter estimates is addressed in model fit indices which gauge the likelihood that one
set of parameters can be used across all groups. Therefore, the problem of scale invariance
across groups of respondents can typically be framed as the question of how well a single
model fits the responses of all groups. However, the procedures used to evaluate the fit of
these models pose a series of theoretical and practical problems. The most commonly
applied procedures to establish invariance of cognitive and non-cognitive scales across
countries in large-scale surveys are developed within the framework of confirmatory factor
analysis and item response theory. The criteria that are commonly applied to evaluate the
fit of such models, such as the decrement of the Comparative Fit Index in confirmatory
factor analysis, work normally well in the comparison of a small number of countries or
groups, but can perform poorly in large-scale surveys featuring a large number of countries.
More specifically, the common criteria often result in the non-rejection of metric
invariance; however, the step from metric invariance (i.e. identical factor loadings across
countries) to scalar invariance (i.e. identical intercepts, in addition to identical factor
loadings) appears to set overly restrictive standards for scalar invariance (i.e. identical
intercepts). This report sets out to identify and apply novel procedures to evaluate model
fit across a large number of groups, or novel scaling models that are more likely to pass
common model fit criteria.
Using both real and simulated data, the following procedures are described and applied:
multigroup confirmatory factor analysis, followed by alignment analysis of the same data
set; Bayesian approximate measurement invariance; Bayesian measurement invariance
testing in Item-Response Theory (IRT) models; and multigroup and multilevel latent class
analysis. These approaches have the potential to resolve recurrent fit problems in invariance
testing. Though promising, more work with these new approaches is needed to establish
their suitability in large-scale surveys. The last chapter reports the conclusions from a
conference in which these approaches were discussed, along with traditional approaches,
in order to provide
Original language | English |
---|---|
Publisher | OECD |
Number of pages | 111 |
DOIs | |
Publication status | Published - 2019 |
Publication series
Name | OECD Education Working Papers |
---|---|
Volume | 201 |
ISSN (Electronic) | 1993-9019 |