TY - GEN
T1 - Longest common prefixes with k-errors and applications
AU - Ayad, Lorraine A.K.
AU - Barton, Carl
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Although real-world text datasets, such as DNA sequences, are far from being uniformly random, string searching average-case algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length n that occurs elsewhere in the string with k-errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant k and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in (Formula presented) time on average, where c>1 is a constant, using O(n) space. Finally, we show that our technique is applicable to several algorithmic problems found in computational biology and elsewhere. The importance of our technique lies on the fact that it is the first one achieving this bound for non-constant k and using O(n) space.
AB - Although real-world text datasets, such as DNA sequences, are far from being uniformly random, string searching average-case algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length n that occurs elsewhere in the string with k-errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant k and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in (Formula presented) time on average, where c>1 is a constant, using O(n) space. Finally, we show that our technique is applicable to several algorithmic problems found in computational biology and elsewhere. The importance of our technique lies on the fact that it is the first one achieving this bound for non-constant k and using O(n) space.
KW - k-errors
KW - k-mismatches
KW - Longest common factor
KW - Longest common prefix
KW - Longest common substring
UR - http://www.scopus.com/inward/record.url?scp=85054783468&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054783468&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-00479-8_3
DO - 10.1007/978-3-030-00479-8_3
M3 - Conference contribution
AN - SCOPUS:85054783468
SN - 9783030004781
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 27
EP - 41
BT - String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings
A2 - Gagie, Travis
A2 - Moffat, Alistair
A2 - Navarro, Gonzalo
A2 - Cuadros-Vargas, Ernesto
PB - Springer Verlag
T2 - 25th International Symposium on String Processing and Information Retrieval, SPIRE 2018
Y2 - 9 October 2018 through 11 October 2018
ER -