TY - GEN
T1 - Generalised Implementation for Fixed-Length Approximate String Matching under Hamming Distance & Applications
AU - Pissis, Solon
AU - Retha, Ahmad
PY - 2015/9/29
Y1 - 2015/9/29
N2 - Approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from a pattern x of length m ≤ n. Fixed-length approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from any factor of length h of a pattern x of length m ≤ n, where h ≤ m. It is thus a generalisation of approximate string matching and has numerous direct applications in molecular biology - motif extraction and circular sequence alignment, to name a few. MaxShiftM is a bit-parallel algorithm for fixed-length approximate string matching under the Hamming distance model with time complexity O (m[h/w]n), where w is the size of the computer word (Crochemore et al., 2010). An implementation of this algorithm is straightforward as long as the maximal length of alignments is less than or equal to w. In this article, our contribution is twofold: first, we propose a generalised implementation of MaxShiftM, that is, for any given h ≤ m, under the Hamming distance model, and second, we show how our implementation can be used to improve the accuracy and efficiency of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies.
AB - Approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from a pattern x of length m ≤ n. Fixed-length approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from any factor of length h of a pattern x of length m ≤ n, where h ≤ m. It is thus a generalisation of approximate string matching and has numerous direct applications in molecular biology - motif extraction and circular sequence alignment, to name a few. MaxShiftM is a bit-parallel algorithm for fixed-length approximate string matching under the Hamming distance model with time complexity O (m[h/w]n), where w is the size of the computer word (Crochemore et al., 2010). An implementation of this algorithm is straightforward as long as the maximal length of alignments is less than or equal to w. In this article, our contribution is twofold: first, we propose a generalised implementation of MaxShiftM, that is, for any given h ≤ m, under the Hamming distance model, and second, we show how our implementation can be used to improve the accuracy and efficiency of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies.
KW - algorithms on strings
KW - approximate string matching
KW - dynamic programming
UR - https://www.scopus.com/pages/publications/84962262565
UR - https://www.scopus.com/inward/citedby.url?scp=84962262565&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2015.106
DO - 10.1109/IPDPSW.2015.106
M3 - Conference contribution
AN - SCOPUS:84962262565
T3 - Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015
SP - 367
EP - 374
BT - Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015
Y2 - 25 May 2015 through 29 May 2015
ER -