On Lasso Estimation of Linear Mixed Model for High Dimensional Longitudinal Data
View/Open
Abstract
With the advancement of technology in data collection, repeated measurements with high dimensional covariates have become increasingly common. The classical statistics approach for modeling the data of this kind is via the linear mixed model with temporally correlated error. However, most of the research reported in the literature for variable selection is for independent response data. In this study, the proposed algorithm employs Expectation and Maximization (EM) and Least Absolute Shrinkage and Selection Operator (LASSO) approaches under the linear mixed model scheme with the assumption of Gaussianity, an approach that works for data with interdependence. Our algorithm involves two steps: 1.Variance-covariance components estimation by EM; and 2.Variable selection by LASSO. The crucial challenge arises from the fact that linear mixed models usually allow structured variance-covariance, which, in return, renders complexity in its estimation: No explicit maxima in general in the M-step of the EM algorithm. Our EM algorithm uses one iteration of the projection gradient descent method, which turns out to be quite computationally efficient compared with the classical EM algorithm because it obviates the process of finding the maxima of the variance-covariance components in the M-step. With the estimates of variance-covariance components obtained from step 1, the LASSO estimation is executed on the full log-likelihood function imposed with an L1 regularization. The LASSO method has the effect of shrinking all coefficients towards zero, which plays a variable selection role. We apply the gradient descent algorithm to find LASSO estimates and the pathwise coordinate descent to set up the tuning parameter for the penalized log-likelihood function. The simulation studies are carried out under the assumption that measurement errors of each subject are of first-order autoregressive AR(1) correlation structure. The numerical results show that the variance-covariance parameters estimates by our method are comparable to the classic Newton-Raphson (NR) method in the simple case and outperforms NR method when the variance-covariance matrix having a complex structure. Moreover, our method successfully identifies all the relevant explanatory variables and most of the redundant explanatory variables. The proposed method is also applied to a life data and the result is very reasonable.