Repository logo
 

Association between Gut Microbiome and Parkinson's Disease Revealed by Sparse Learning

Date

2021-05-25

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

0000-0002-0598-6249

Type

Thesis

Degree Level

Masters

Abstract

\textbf{Background:} Many studies indicate that the human gut microbiota is likely to have connections with Parkinson disease (PD). Based on these indications, this thesis explores the association between PD and human gut microbiota, from a statistical machine learning perspective. With the purpose of identifying the association between PD and gut microbiota, we assess the predictivity of microbial operational taxonomy units (OTUs) that are extracted from participants' gut. \textbf{Methods:} We use linear support vector machine (SVM) and logistic regression combined with $L_1$ penalty and elastic-net penalty, to identify informative OTUs for PD. $L_1$ penalty is able to do shrinkage for features, which effectively implements feature selection by setting the coefficients of non-significant variables to be zero. Conversely, coefficients with larger absolute values indicate that the OTUs are more closely related to PD. Elastic-net penalty is capable of grouping correlated variables. Under these two penalties, SVM and logistic regression can achieve good predictive results as well as feature selection. In order to make full use of dataset and to avoid overfitting, we run models with Leave-one-out cross-validation (LOOCV). There are tuning parameters, $\lambda$ for each regularization. After running models with LOOCV, we choose the optimal $\lambda$ for each model, using test error rate as the criterion. \textbf{Results and Conclusions:} We analyze the performance of each optimal model , by calculating and understanding evaluation metrics of these models. Then, we find that for our dataset, logistic regression with $L_1$ penalty has the best performance. $R_{ER}^2$, $R_{AMLP}^2$, AUC and AUPR of logistic regression with $L_1$ are 43.9\%, 25.7\%, 0.8259 and 0.8788. We focus on the selected OTUs based on coefficients generated by models, and to the ranking of OTUs, according to their level of relevance to PD. Then, we find that some OTUs selected by logistic regression with $L_1$ have been identified in previous studies of micro-organisms, including Lactobacillus, Roseburia, Bluatia, Akkermansia and Bifidobacterium. We also explore predictive performances of logistic regression with elastic-net and regularized SVM, and then focus on OTUs selected by these models. The OTUs selected by these models also overlap with those identified by previous researchers.

Description

Keywords

Statistical Machine Learning, Sparse Learning, Regularization, Microbiome, Parkinson.

Citation

Degree

Master of Science (M.Sc.)

Department

Mathematics and Statistics

Program

Mathematics

Citation

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid