Longest common prefixes with k-errors and applications

Lorraine A.K. Ayad, Carl Barton, Panagiotis Charalampopoulos*, Costas S. Iliopoulos, Solon P. Pissis

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Although real-world text datasets, such as DNA sequences, are far from being uniformly random, string searching average-case algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length n that occurs elsewhere in the string with k-errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant k and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in (Formula presented) time on average, where c>1 is a constant, using O(n) space. Finally, we show that our technique is applicable to several algorithmic problems found in computational biology and elsewhere. The importance of our technique lies on the fact that it is the first one achieving this bound for non-constant k and using O(n) space.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings
EditorsTravis Gagie, Alistair Moffat, Gonzalo Navarro, Ernesto Cuadros-Vargas
PublisherSpringer Verlag
Pages27-41
Number of pages15
ISBN (Print)9783030004781
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event25th International Symposium on String Processing and Information Retrieval, SPIRE 2018 - Lima, Peru
Duration: 9 Oct 201811 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11147 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Symposium on String Processing and Information Retrieval, SPIRE 2018
Country/TerritoryPeru
CityLima
Period9/10/1811/10/18

Keywords

  • k-errors
  • k-mismatches
  • Longest common factor
  • Longest common prefix
  • Longest common substring

Fingerprint

Dive into the research topics of 'Longest common prefixes with k-errors and applications'. Together they form a unique fingerprint.

Cite this