Comparison of Three Clustering Methods for Dissecting Trait Heterogeneity in Genotypic Data
Thornton-Wells, Tricia Ann
:
2005-07-23
Abstract
Trait heterogeneity, which exists when a trait has been defined with insufficient specificity such that it is actually two or more distinct traits, has been implicated as a confounding factor in traditional statistical genetics of complex human disease. In the absence of detailed phenotypic data collected consistently in combination with genetic data, unsupervised computational methodologies offer the potential for discovering underlying trait heterogeneity. The performance of three such methods—Bayesian Classification, Hypergraph-Based Clustering, and Fuzzy k-Modes Clustering—that are appropriate for categorical data were compared. Also tested was the ability of these methods to additionally detect trait heterogeneity in the presence of locus heterogeneity and gene-gene interaction, which are two other complicating factors in discovering genetic models of complex human disease. Bayesian Classification performed well under the simplest of genetic models simulated, and it outperformed the other two methods, with the exception that the Fuzzy k-Modes Clustering performed best on the most complex genetic model. Permutation testing showed that Bayesian Classification controlled Type I error very well but produced less desirable Type II error rates. Methodological limitations and future directions are discussed.