Compartir
Citas
Título
An Ensemble Framework Coping with Instability in the Gene Selection Process
Autor(es)
Palabras clave
Gene selection
Filter method
Ensemble method
\Wrapper method
Machine learning
Data mining
Gene expression data
Clasificación UNESCO
1203.17 Informática
Fecha de publicación
2018-01-08
Citación
Castellanos-Garzón, J.A., Ramos, J., López-Sánchez, D. et al. An Ensemble Framework Coping with Instability in the Gene Selection Process. Interdiscip Sci Comput Life Sci 10, 12–23 (2018). https://doi.org/10.1007/s12539-017-0274-z
Resumen
[EN] This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.
URI
ISSN
1913-2751 (Print), 1867-1462 (Online)
DOI
10.1007/s12539-017-0274-z
Aparece en las colecciones
- BISITE. Artículos [324]