Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model

Peter C. Austin*, Daniele Giardiello, Stef van Buuren

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

We examined the setting in which a variable that is subject to missingness is used
both as an inclusion/exclusion criterion for creating the analytic sample and
subsequently as the primary exposure in the analysis model that is of scientific
interest. An example is cancer stage, where patients with stage IV cancer are
often excluded from the analytic sample, and cancer stage (I to III) is an exposure
variable in the analysis model. We considered two analytic strategies. The first strategy, referred to as “exclude-then-impute,” excludes
subjects for whom the observed value of the target variable is equal
to the specified value and then uses multiple imputation to complete the data in the resultant sample. The second strategy, referred to
as “impute-then-exclude,” first uses multiple imputation to complete
the data and then excludes subjects based on the observed or filled-in values
in the completed samples. Monte Carlo simulations were used to compare
five methods (one based on “exclude-then-impute” and four based on
“impute-then-exclude”) along with the use of a complete case analysis. We
considered both missing completely at random and missing at random missing data mechanisms. We found that an impute-then-exclude strategy using
substantive model compatible fully conditional specification tended to have
superior performance across 72 different scenarios. We illustrated the
application of these methods using empirical data on patients hospitalized
with heart failure when heart failure subtype was used for cohort creation
(excluding subjects with heart failure with preserved ejection fraction) and was
also an exposure in the analysis model.
Original languageEnglish
Pages (from-to)1525-1541
JournalStatistics in Medicine
Volume42
Issue number10
DOIs
Publication statusPublished - 10 May 2023

Keywords

  • missing data
  • Monte Carlo simulations
  • multiple imputation

Fingerprint

Dive into the research topics of 'Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model'. Together they form a unique fingerprint.

Cite this