A growing number of measures of sequence similarity is being based on some underlying notion of relative compressibility. Within this paradigm, similar sequences are expected to share a large number of common substrings, or subsequences, or more complex patterns or motifs, and so on. The computational complexity of such measures varies, and it increases with the complexion of the patterns taken into account. At the low end of the spectrum, most measures based on the bags of shared substrings are typically afforded in linear time. This performance is no longer achievable as soon as some degree of distortion is accepted. In this paper, measures of sequence similarity are introduced and studied in which patterns in a pair are considered similar if they coincide up to a preset number of mismatches, that is, within a bounded Hamming distance. It is shown here that for some such measures bounds are achievable that are slightly better than O(n^2). Preliminary experiments demonstrate the potential applicability to phylogeny and classification of similarity measures that are rougher than previously adopted ones.

Alignment Free Sequence Similarity with Bounded Hamming Distance

PIZZI, CINZIA
2014

Abstract

A growing number of measures of sequence similarity is being based on some underlying notion of relative compressibility. Within this paradigm, similar sequences are expected to share a large number of common substrings, or subsequences, or more complex patterns or motifs, and so on. The computational complexity of such measures varies, and it increases with the complexion of the patterns taken into account. At the low end of the spectrum, most measures based on the bags of shared substrings are typically afforded in linear time. This performance is no longer achievable as soon as some degree of distortion is accepted. In this paper, measures of sequence similarity are introduced and studied in which patterns in a pair are considered similar if they coincide up to a preset number of mismatches, that is, within a bounded Hamming distance. It is shown here that for some such measures bounds are achievable that are slightly better than O(n^2). Preliminary experiments demonstrate the potential applicability to phylogeny and classification of similarity measures that are rougher than previously adopted ones.
2014
Proceedings of DCC 2014
9781479938827
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2837465
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? ND
social impact