Approximate Circular Pattern Matching

Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Waleń, Wiktor Zuba

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

We investigate the complexity of approximate circular pattern matching (CPM, in short) under the Hamming and edit distance. Under each of these two basic metrics, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions (called occurrences) of fragments of T that are at distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if there is any such occurrence. All previous results for approximate CPM were either average-case upper bounds or heuristics, with the exception of the work of Charalampopoulos et al. [CKP+, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP+, JCSS'21] from O(n+(n/m) k4) to O(n+(n/m) k3 log log k) time; for the edit distance, we give an O(nk2)-time algorithm. Notably, for the decision versions and wide parameter-ranges, we give algorithms whose complexities are almost identical to the state-of-the-art for standard (i.e., non-circular) approximate pattern matching: For the decision version of the approximate CPM problem under the Hamming distance, we obtain an O(n + (n/m) k2 log k/ log log k)-time algorithm, which works in O(n) time whenever k = O( p mlog log m/logm). In comparison, the fastest algorithm for the standard counterpart of the problem, by Chan et al. [CGKKP, STOC'20], runs in O(n) time only for k = O(√ m). We achieve this result via a reduction to a geometric problem by building on ideas from [CKP+, JCSS'21] and Charalampopoulos et al. [CKW, FOCS'20]. For the decision version of the approximate CPM problem under the edit distance, the O(nk log3 k) runtime of our algorithm near matches the O(nk) runtime of the Landau-Vishkin algorithm [LV, J. Algorithms'89] for approximate pattern matching under edit distance; the latter algorithm remains the fastest known for k = Ω(m2/5). As a stepping stone, we propose an O(nk log3 k)-time algorithm for solving the Longest Prefix k-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all k ∈ {1, , k}. Our algorithm is based on Tiskin's theory of seaweeds [Tiskin, Math. Comput. Sci.'08], with recent advancements (see Charalampopoulos et al. [CKW, FOCS'22]), and on exploiting the seaweeds' relation to Monge matrices. In contrast, we obtain a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute the Strong Exponential Time Hypothesis.

Original languageEnglish
Title of host publication30th Annual European Symposium on Algorithms (ESA 2022)
EditorsShiri Chechik, Gonzalo Navarro, Eva Rotenberg, Grzegorz Herman
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages1-19
Number of pages19
Volume244
ISBN (Electronic)9783959772471
DOIs
Publication statusPublished - 2022
Event30th Annual European Symposium on Algorithms, ESA 2022 - Berlin/Potsdam, Germany
Duration: 5 Sept 20229 Sept 2022

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume244
ISSN (Print)1868-8969

Conference

Conference30th Annual European Symposium on Algorithms, ESA 2022
Country/TerritoryGermany
CityBerlin/Potsdam
Period5/09/229/09/22

Bibliographical note

Funding Information:
Funding Panagiotis Charalampopoulos: Supported by Israel Science Foundation (ISF) grant 810/21 when the work that led to this paper was conducted. Jakub Radoszewski: Supported by the Polish National Science Center, grant no. 2018/31/D/ ST6/03991. Solon P. Pissis: Supported by the PANGAIA and ALPACA projects that have received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No 872539 and 956229, respectively. Tomasz Waleń: Supported by the Polish National Science Center, grant no. 2018/31/D/ST6/03991. Wiktor Zuba: Supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003.

Publisher Copyright:
© 2022 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.

Keywords

  • approximate circular pattern matching
  • edit distance
  • Hamming distance

Fingerprint

Dive into the research topics of 'Approximate Circular Pattern Matching'. Together they form a unique fingerprint.

Cite this