Abstract
Missing data is a common occurrence in clinical research. Missing data occurs when the value of the variables of interest are not measured or recorded for all subjects in the sample. Common approaches to addressing the presence of missing data include complete-case analyses, where subjects with missing data are excluded, and mean-value imputation, where missing values are replaced with the mean value of that variable in those subjects for whom it is not missing. However, in many settings, these approaches can lead to biased estimates of statistics (eg, of regression coefficients) and/or confidence intervals that are artificially narrow. Multiple imputation (MI) is a popular approach for addressing the presence of missing data. With MI, multiple plausible values of a given variable are imputed or filled in for each subject who has missing data for that variable. This results in the creation of multiple completed data sets. Identical statistical analyses are conducted in each of these complete data sets and the results are pooled across complete data sets. We provide an introduction to MI and discuss issues in its implementation, including developing the imputation model, how many imputed data sets to create, and addressing derived variables. We illustrate the application of MI through an analysis of data on patients hospitalised with heart failure. We focus on developing a model to estimate the probability of 1-year mortality in the presence of missing data. Statistical software code for conducting MI in R, SAS, and Stata are provided.
Original language | English |
---|---|
Pages (from-to) | 1322-1331 |
Number of pages | 10 |
Journal | Canadian Journal of Cardiology |
Volume | 37 |
Issue number | 9 |
DOIs | |
Publication status | Published - Sept 2021 |
Bibliographical note
Funding Information:This study was supported by the Institute for Clinical Evaluative Sciences (ICES), which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred. The data sets used for this study were held securely in a linked deidentified form and analysed at ICES. Although data-sharing agreements prohibit ICES from making the data set publicly available, access may be granted to those who meet prespecified criteria for confidential access, as described at https://www.ices.on.ca/DAS . This research was supported by a operating grant from the Canadian Institutes of Health Research (CIHR). The EFFECT data used in the study was funded by a CIHR Team Grant in Cardiovascular Outcomes Research (grant numbers CTP79847 and CRT43823 ). PCE and DSL are supported in part by Mid-Career Investigator awards from the Heart and Stroke Foundation . DSL is supported by the Ted Rogers Chair in Heart Function Outcomes. IRW was supported by the Medical Research Council Programme MC_UU_12023/21 .
Publisher Copyright:
© 2020 The Authors
Funding
This study was supported by the Institute for Clinical Evaluative Sciences (ICES), which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred. The data sets used for this study were held securely in a linked deidentified form and analysed at ICES. Although data-sharing agreements prohibit ICES from making the data set publicly available, access may be granted to those who meet prespecified criteria for confidential access, as described at https://www.ices.on.ca/DAS . This research was supported by a operating grant from the Canadian Institutes of Health Research (CIHR). The EFFECT data used in the study was funded by a CIHR Team Grant in Cardiovascular Outcomes Research (grant numbers CTP79847 and CRT43823 ). PCE and DSL are supported in part by Mid-Career Investigator awards from the Heart and Stroke Foundation . DSL is supported by the Ted Rogers Chair in Heart Function Outcomes. IRW was supported by the Medical Research Council Programme MC_UU_12023/21 .
Keywords
- Clinical Trials as Topic
- Data Interpretation, Statistical
- Humans
- Research Design