UR Research > URMC Theses > School of Medicine and Dentistry Theses >

Novel Statistical Methods for Gene Set Enrichment Analysis with Empirical Memberships for Overlapping Genes

URL to cite or link to: http://hdl.handle.net/1802/35513

Zhang_rochester_0188E_11688.pdf   3.68 MB (No. of downloads : 150)
PDF of thesis
Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biostatistics & Computational Biology, 2018.
Gene Set Enrichment Analysis (GSEA) is a powerful inferential tool that incorporates knowledge of a priori defined gene sets (e.g. molecular pathways) into the high-throughput data analyses. Knowledge-based gene sets are available in bioinformatics resources such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. In databases built for general purposes, multifunctional genes are assigned to a number of pathways simultaneously. For study-specific analyses (e.g. a specific disease), these genes overlapped in multiple pathways are counted multiple times no matter if their signals are associated with the disease or not. However, most existing methods ignore the effect of the overlapping genes in GSEA. In this thesis, we reveal the substantial overlapping in KEGG pathways. We show that the overlapping genes present pathway-specific activations under the study-specific condition. Further, we computationally decompose the overlapping genes using study-specific data and develop appropriate similarity measures to assign their pathway memberships empirically. Unlike the traditional binary membership (i.e. either 0 or 1), the empirical membership is quantified using continuous weights. We design novel GSEA methods for two types of data: time-course data and data with limited time points (e.g.cross-sectional data). The former data contain rich temporal information in individual subjects, which have the potential to lead to personalized inference for precision medicine diagnosis. The later data have simpler structure and are available from the vast majority of studies. By using functional data analysis and high-dimensional statistical learning tools, we build the functional model and the cross-sectional model with respect to the above data types. Upon obtaining the weights (a.k.a. empirical memberships), we also derive two generalized hypothesis tests (i.e. one parametric test and one nonparametric test) that accommodate both weights and inter-gene correlation for the pathway-level test. In contrast to the classical tests, these generalized tests not only are more flexible, but also enormously reduce the computational burden for various applications of high-throughput data. For each new method, we conduct simulation studies and demonstrate through real data analyses. Lastly, all developed work are implemented with efficient algorithms in R packages that are publicly available
Contributor(s):
Yun Zhang - Author

Xing Qiu - Thesis Advisor

Juilee Thakar - Thesis Advisor

Primary Item Type:
Thesis
Language:
English
Subject Keywords:
Elastic-net; Functional data analysis; Gene set enrichment analysis with empirical memberships.
Sponsor - Description:
National Center for Advancing Translational Sciences (NCATS) - UL1TR002001
National Institute of Environmental Health Sciences (NIEHS) - T32ES007271
First presented to the public:
1/31/2020
Originally created:
2018
Original Publication Date:
2018
Previously Published By:
University of Rochester School of Medicine and Dentistry
Place Of Publication:
Rochester, N.Y.
Citation:
Extents:
Illustrations - Illustrations : some color.
Number of Pages - xv, 138 pages.
License Grantor / Date Granted:
Jennifer McCarthy / 2020-01-31 14:04:41.944 ( View License )
Date Deposited
2020-01-31 14:04:41.944
Submitter:
Jennifer McCarthy

Copyright © This item is protected by copyright, with all rights reserved.

All Versions

Thumbnail Name Version Created Date
Novel Statistical Methods for Gene Set Enrichment Analysis with Empirical Memberships for Overlapping Genes1 2020-01-31 14:04:41.944