TY - GEN
T1 - Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously
AU - Cai, Mingyang
AU - van Buuren, Stef
AU - Vink, Gerko
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022/7/7
Y1 - 2022/7/7
N2 - Predictive mean matching (PMM) is an easy-to-use and versatile univariate imputation approach. It is robust against transformations of the incomplete variable and violation of the normal model. However, univariate imputation methods cannot directly preserve multivariate relations in the imputed data. We wish to extend PMM to a multivariate method to produce imputations that are consistent with the knowledge of derived data (e.g., data transformations, interactions, sum restrictions, range restrictions, and polynomials). This paper proposes multivariate predictive mean matching (MPMM), which can impute incomplete variables simultaneously. Instead of the normal linear model, we apply canonical regression analysis to calculate the predicted value used for donor selection. To evaluate the performance of MPMM, we compared it with other imputation approaches under four scenarios: 1) multivariate normal distributed data, 2) linear regression with quadratic terms; 3) linear regression with interaction terms; 4) incomplete data with inequality restrictions. The simulation study shows that with moderate missingness patterns, MPMM provides plausible imputations at the univariate level and preserves relations in the data.
AB - Predictive mean matching (PMM) is an easy-to-use and versatile univariate imputation approach. It is robust against transformations of the incomplete variable and violation of the normal model. However, univariate imputation methods cannot directly preserve multivariate relations in the imputed data. We wish to extend PMM to a multivariate method to produce imputations that are consistent with the knowledge of derived data (e.g., data transformations, interactions, sum restrictions, range restrictions, and polynomials). This paper proposes multivariate predictive mean matching (MPMM), which can impute incomplete variables simultaneously. Instead of the normal linear model, we apply canonical regression analysis to calculate the predicted value used for donor selection. To evaluate the performance of MPMM, we compared it with other imputation approaches under four scenarios: 1) multivariate normal distributed data, 2) linear regression with quadratic terms; 3) linear regression with interaction terms; 4) incomplete data with inequality restrictions. The simulation study shows that with moderate missingness patterns, MPMM provides plausible imputations at the univariate level and preserves relations in the data.
KW - Block imputation
KW - Canonical regression analysis
KW - Missing data
KW - Multiple imputation
KW - Multivariate analysis
KW - Predictive mean matching
UR - http://www.scopus.com/inward/record.url?scp=85135097397&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-10461-9_5
DO - 10.1007/978-3-031-10461-9_5
M3 - Conference contribution
AN - SCOPUS:85135097397
SN - 978-3-031-10460-2
T3 - Lecture Notes in Networks and Systems
SP - 75
EP - 91
BT - Intelligent Computing
A2 - Arai, Kohei
PB - Springer
CY - Cham
T2 - Computing Conference, 2022
Y2 - 14 July 2022 through 15 July 2022
ER -