Approximate pattern matching using search schemes and in-text verification
- Author
- Luca Renders (UGent) , Lore Depuydt (UGent) and Jan Fostier (UGent)
- Organization
- Project
- Abstract
- Approximate pattern matching entails finding approximate occurrences of a search pattern P in a search text T. In a typical bioinformatics scenario, P is a DNA fragment (a read) and T a (collection of) reference genomes(s). Search schemes are an efficient computational technique that guarantee to exhaustively identify all occurrences within a pre-specified number of allowed errors (substitutions and indels). Using a bidirectional FM-index, they describe how to traverse the search space in such a way that runtime is minimized. Despite their optimal time complexity, basic operations on the FM-index require expensive random memory access. We examine whether in-text verification, in which a candidate occurrence is validated in the search text via a bit-parallel pairwise alignment approach, can be used to supplement in-index matching using search schemes. In comparison to pure in-index matching, we find that hybrid in-index/in-text matching can reduce running time by more than a factor of two. We introduce Columba 1.1, an open-source C++ software program that efficiently implements these ideas. Columba 1.1 can locate, within a maximum edit distance of 4, all occurrences of 100,000 reads (150 bp) in the human reference genome in about 30 seconds. This outperforms existing, state-of-the-art lossless alignment tools such as Bwolo, Yara and GEM and is comparable to the lossy aligner BWA in mem mode.
Downloads
-
fears 2022 pitch.pdf
- full text (Author's original)
- |
- open access
- |
- |
- 75.86 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01GMVAZ64V3YYZZA12S990AGQC
- MLA
- Renders, Luca, et al. “Approximate Pattern Matching Using Search Schemes and In-Text Verification.” Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Abstracts, 2022, doi:10.5281/zenodo.7401042.
- APA
- Renders, L., Depuydt, L., & Fostier, J. (2022). Approximate pattern matching using search schemes and in-text verification. Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Abstracts. Presented at the Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Ghent, Belgium. https://doi.org/10.5281/zenodo.7401042
- Chicago author-date
- Renders, Luca, Lore Depuydt, and Jan Fostier. 2022. “Approximate Pattern Matching Using Search Schemes and In-Text Verification.” In Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Abstracts. https://doi.org/10.5281/zenodo.7401042.
- Chicago author-date (all authors)
- Renders, Luca, Lore Depuydt, and Jan Fostier. 2022. “Approximate Pattern Matching Using Search Schemes and In-Text Verification.” In Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Abstracts. doi:10.5281/zenodo.7401042.
- Vancouver
- 1.Renders L, Depuydt L, Fostier J. Approximate pattern matching using search schemes and in-text verification. In: Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Abstracts. 2022.
- IEEE
- [1]L. Renders, L. Depuydt, and J. Fostier, “Approximate pattern matching using search schemes and in-text verification,” in Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Abstracts, Ghent, Belgium, 2022.
@inproceedings{01GMVAZ64V3YYZZA12S990AGQC, abstract = {{Approximate pattern matching entails finding approximate occurrences of a search pattern P in a search text T. In a typical bioinformatics scenario, P is a DNA fragment (a read) and T a (collection of) reference genomes(s). Search schemes are an efficient computational technique that guarantee to exhaustively identify all occurrences within a pre-specified number of allowed errors (substitutions and indels). Using a bidirectional FM-index, they describe how to traverse the search space in such a way that runtime is minimized. Despite their optimal time complexity, basic operations on the FM-index require expensive random memory access. We examine whether in-text verification, in which a candidate occurrence is validated in the search text via a bit-parallel pairwise alignment approach, can be used to supplement in-index matching using search schemes. In comparison to pure in-index matching, we find that hybrid in-index/in-text matching can reduce running time by more than a factor of two. We introduce Columba 1.1, an open-source C++ software program that efficiently implements these ideas. Columba 1.1 can locate, within a maximum edit distance of 4, all occurrences of 100,000 reads (150 bp) in the human reference genome in about 30 seconds. This outperforms existing, state-of-the-art lossless alignment tools such as Bwolo, Yara and GEM and is comparable to the lossy aligner BWA in mem mode.}}, author = {{Renders, Luca and Depuydt, Lore and Fostier, Jan}}, booktitle = {{Faculty of Engineering and Architecture Research Symposium 2022 (FEARS 2022), Abstracts}}, language = {{eng}}, location = {{Ghent, Belgium}}, pages = {{1}}, title = {{Approximate pattern matching using search schemes and in-text verification}}, url = {{http://doi.org/10.5281/zenodo.7401042}}, year = {{2022}}, }
- Altmetric
- View in Altmetric