Abstract
Obtaining accurate measures of changing abilities is essential for online adaptive learning
systems (ALS) to provide learners with instructional material and practice items at the
appropriate level. Despite the large data volume in ALS, focusing on a single individual
at a specific moment yields limited data, reducing measurement precision. To improve
quality of measurement in ALS, it has been proposed to incorporate response time (RT)
data into measurement (Klinkenberg et al., 2011). Using data from an ALS for primary
school mathematics Math Garden (Straatemeier, 2014), we compare different models that
incorporate RTs and can be used for ability tracking with each other and with the
benchmark Rasch model, in which ability is measured based only on the accuracy of the
responses. We also contrast these empirical findings with simulation results that match
the empirical setup but where the generating model matches the way in which RTs are
incorporated into measurement, to study what can be gained under ideal circumstances.
Our results show that while theoretical gains are large, in the studied empirical setting
RTs at best provide a modest improvement that was not fully consistent across domains
and depended on the reward system implemented in the ALS. Implications of the results
for the choice of the measurement model in ALS are discussed.
systems (ALS) to provide learners with instructional material and practice items at the
appropriate level. Despite the large data volume in ALS, focusing on a single individual
at a specific moment yields limited data, reducing measurement precision. To improve
quality of measurement in ALS, it has been proposed to incorporate response time (RT)
data into measurement (Klinkenberg et al., 2011). Using data from an ALS for primary
school mathematics Math Garden (Straatemeier, 2014), we compare different models that
incorporate RTs and can be used for ability tracking with each other and with the
benchmark Rasch model, in which ability is measured based only on the accuracy of the
responses. We also contrast these empirical findings with simulation results that match
the empirical setup but where the generating model matches the way in which RTs are
incorporated into measurement, to study what can be gained under ideal circumstances.
Our results show that while theoretical gains are large, in the studied empirical setting
RTs at best provide a modest improvement that was not fully consistent across domains
and depended on the reward system implemented in the ALS. Implications of the results
for the choice of the measurement model in ALS are discussed.
| Original language | English |
|---|---|
| Publisher | PsyArXiv |
| Number of pages | 34 |
| DOIs | |
| Publication status | Published - 3 Nov 2025 |