Modeling and Visualizing Uncertainty in Gene Expression Clusters using 
Dirichlet Process Mixtures

Rasmussen, CE; de la Cruz , BJ; Ghahramani, Z; Wild, DL

doi:10.1109/TCBB.2007.70269

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures

MPG-Autoren

/persons/resource/persons84156

Rasmussen, CE
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Externe Ressourcen

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4407680
(Verlagsversion)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Rasmussen, C., de la Cruz, B., Ghahramani, Z., & Wild, D. (2009). Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(4), 615-628. doi:10.1109/TCBB.2007.70269.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0013-C262-C

Zusammenfassung

Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture models provide a non-parametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model based clustering methods have been to short time series data. In this paper we present a case study of the application of non-parametric Bayesian clustering methods to the clustering of high-dimensional non-time series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a Dirichlet process mixture model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one
of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.