Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biostatistics & Computational Biology, 2019.
Continuous, binary, or time-to-event data are commonly encountered in biomedical research. The corresponding linear, logistic, or Cox proportional hazards regression models are widely used in data analysis. Although multiple regressions have a long-established foothold in applied statistics, the misuses and misinterpretations are still very prevalent in biomedical research. In the first part, we discuss a variable selection strategy called the Univariate Analysis Screening (UAS), which is frequently implemented in top biomedical journals. This method is also called the marginal regression in statistics and has also been recommended as a way of finding the suite of candidate models under certain conditions/assumptions in some statistical journals. With basic statistical theory and extensive simulation studies, we clarify some paradoxes around the linear, the logistic, and the Cox regression models. Our results show that this widely used variable selection procedure is problematic. Formal procedures based on solid statistical theory should be used in variable selection. We then propose a method called the Multi-splitting Backward Elimination (Must-Be) in the second portion of this thesis. This procedure can handle variable selection problems in linear, logistic, Cox regression models with moderately to highly correlated predictors and moderate sample size. In addition, the Must-Be procedure is a generic approach, which can be easily extended to solve variable selection problems whenever asymptotic valid p-values are available.In the third part, we extend the one-step Must-Be procedure to a multiple-step variable selection procedure, called the Iterative Multi-splitting Backward Elimination (It-Must-Be). Compared with the method in the previous section, the It-Must-Be procedure enjoys the benet of "variable ranking", which is taken place in each iteration. We compare our Must-Be and It-Must-Be procedures with a few well-known variable selection methods using extensive simulation studies. Our new procedure shows similar or better performance in minimizing the false negative rate and the false positive rate simultaneously under certain scenarios.