Mixture models, theory with application to species richness, clustering, classification and all that jazz

Kulagina, Yulia

doi:10.3929/ethz-b-000581661

Download

Full text (PDF, 4.448Mb)

Open access

Author

Kulagina, Yulia

Date

2022

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 4.448Mb)

Rights / license

Creative Commons Attribution 4.0 International

Abstract

Mixture models occur in numerous settings including random and fixed effects models, clustering, deconvolution, empirical Bayes problems and many others. They are often used to model data originating from a heterogeneous population, consisting of several homogeneous subpopulations, thus the problem of finding a good estimator for the number of components in the mixture arises naturally. Estimation of the order of a finite mixture model is a hard statistical task, and multiple techniques have been suggested for solving it. In this thesis we concentrate on several such methods that have not gained much popularity but are nonetheless interesting from the theoretical viewpoint as well as deserve the attention of practitioners. The said methods can be categorized into three groups: tools built upon the determinant of the Hankel matrix of moments of the mixing distribution, minimum distance estimators, likelihood ratio tests. One of the valuable features of all of these approaches is that they all come with theoretical guarantees for consistency. We address theoretical pillars underlying each of the statistical techniques and present the results of the comparative numerical study that has been conducted under various scenarios. In addition to the above mentioned methods we have also added the results of the neural-network-based approach. According to the results, none of the methods proves to be a "magic pill". The results uncover limitations of the techniques and provide practical hints for choosing the best-suited tool under specific conditions. After discussing the relevant theory and analysing some simulation results, we introduce the software that allows for convenient and flexible implementation of the discussed methods for simulated data as well as for real datasets whenever the data in hand is univariate. We also demonstrate the performance of the studied techniques on real world data. We further discuss the feasibility of extensions of some of these methods to the multidimensional setting and present some simulation results. Afterwards we apply the multidimensional extensions of the studied approaches in the context of the clustering problem when analysing two multivariate real-world datasets. Studying the data structure using the multivariate mixture model for one of the sets and experimenting with the solutions has led us to the final result we did not expect to achieve when starting our work. In the scope of this thesis we also consider the application of the mixture models to the problem of species richness estimation under the assumption of complete monotonicity of the distribution of species abundances. Complete monotonicity, which can be shown to be linked to the mixture of geometric distributions, is the natural substitute for k-monotonicity when k is large. As it is known that the latter model has already been quite successfully considered in the species richness estimation problem, the complete monotonicity approach seems to be a natural alternative to the k-monotone solution when k is large. An extended simulation study indicates that the complete monotone estimator is quite competitive when compared to other available estimators, and this remains true even when complete monotonicity is not satisfied. Using four real datasets, we further illustrate how our method can be applied in practice. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000581661

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Balabdaoui, Fadoua
Examiner: Bühlmann, Peter

Publisher

ETH Zurich

Organisational unit

08845 - Balabdaoui, Fadoua (Tit.-Prof.) / Balabdaoui, Fadoua (Tit.-Prof.)

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Mixture models, theory with application to species richness, clustering, classification and all that jazz Mendeley CSV RIS BibTeX

Mixture models, theory with application to species richness, clustering, classification and all that jazz

Mendeley

CSV

RIS

BibTeX