Assessment and Improvement of a Sequential Regression Multivariate Imputation Algorithm.

Zhu, Jian

Assessment and Improvement of a Sequential Regression Multivariate Imputation Algorithm.

Zhu, Jian

2016

View/Open

jianzhu_1.pdf

(824.2KB

PDF)

Abstract

The sequential regression multivariate imputation (SRMI, also known as chained equations or fully conditional specifications) is a popular approach for handling missing values in highly complex data structures with many types of variables, structural dependencies among the variables and bounds on plausible imputation values. It is a Gibbs style algorithm with iterative draws from the posterior predictive distribution of missing values in any given variable, conditional on all observed and imputed values of all other variables. However, a theoretical weakness of this approach is that the specification of a set of fully conditional regression models may not be compatible with a joint distribution of the variables being imputed. Hence, the convergence properties of the iterative algorithm are not well understood. The dissertation will focus on assessing and improving the SRMI algorithm. Chapter 2 develops conditions for convergence and assesses the properties of inferences from both compatible and incompatible sequences of generalized linear regression models. The results are established for the missing data pattern where each subject may be missing a value on at most one variable. The results are used to develop criteria for the choice of regression models. Chapter 3 proposes a modified block sequential regression multivariate imputation (BSRMI) approach to divide the data into blocks for each variable based on missing data patterns and tune the regression models through compatibility restrictions. This is extremely helpful to avoid divergence when the data are missing in general patterns and when it is difficult to get well fitting models across all missing data patterns. Conditions for the convergence of the algorithm are established, and the repeated sampling properties of inferences using several simulated data sets are studied. Chapter 4 extends the imputation model selection to quasi-likelihood regression models in both SRMI and BSRMI to better capture structure in the prediction model for the missing values. The performance of the modified approach is examined through simulation studies. The results show that extension to quasi-likelihood regression models makes it easier to choose better fitting model sequences to yield desirable repeated sampling properties of the multiple imputation estimates.

Subjects

Missing Data Multiple Imputation

Sequential Regression Multivariate Imputation

Compatible Conditional Specifications

Block-specific Sequential Regression Multivariate Imputation

Sequential Regression Multivariate Imputation by Quasi-Likelihood Regression Models

Types

Thesis

Handle

https://hdl.handle.net/2027.42/133402

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.