Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/139567
Citations
Scopus Web of Science® Altmetric
?
?
Type: Journal article
Title: Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework
Author: Li, F.
Chen, J.
Ge, Z.
Wen, Y.
Yue, Y.
Hayashida, M.
Baggag, A.
Bensmail, H.
Song, J.
Citation: Briefings in Bioinformatics, 2021; 22(2):2126-2140
Publisher: Oxford University Press (OUP)
Issue Date: 2021
ISSN: 1467-5463
1477-4054
Statement of
Responsibility: 
Fuyi Li, Jinxiang Chen, Zongyuan Ge, Ya Wen, Yanwei Yue, Morihiro Hayashida, Abdelkader Baggag, Halima Bensmail and Jiangning Song
Abstract: Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing 'Black-box' approaches that are unable to reveal causal relationships from large amounts of initially encoded features.
Keywords: promoters; bioinformatics; sequence analysis; machine learning; stacking strategy; model interpretability
Rights: © The Author(s) 2020. Published by Oxford University Press. All rights reserved.
DOI: 10.1093/bib/bbaa049
Grant ID: http://purl.org/au-research/grants/arc/LP110200333
http://purl.org/au-research/grants/arc/DP120104460
http://purl.org/au-research/grants/nhmrc/1092262
Published version: http://dx.doi.org/10.1093/bib/bbaa049
Appears in Collections:Medicine publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.