Elastic-Degenerate String Matching with 1 Error

Giulia Bernardini, Esteban Gabory, Solon P. Pissis, Leen Stougie, Michelle Sweering, Wiktor Zuba*

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

58 Downloads (Pure)

Abstract

An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~ (nmω-1) + O(N) -time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~ (· ) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+ kN) time under edit distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k= 1, the existing algorithm runs in Ω(mN) time in the worst case. Here we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+ N) log m) or O(nm3+ N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm) -time algorithm. Our algorithms rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or range emptiness), which we show how to solve efficiently.

Original languageEnglish
Title of host publicationLATIN 2022: Theoretical Informatics
Subtitle of host publication15th Latin American Symposium, Guanajuato, Mexico, November 7–11, 2022, Proceedings
EditorsArmando Castañeda, Francisco Rodríguez-Henríquez
PublisherSpringer Science and Business Media Deutschland GmbH
Pages20-37
Number of pages18
ISBN (Electronic)9783031206245
ISBN (Print)9783031206238
DOIs
Publication statusPublished - 2022
Event15th Latin American Symposium on Theoretical Informatics, LATIN 2022 - Guanajuato, Mexico
Duration: 7 Nov 202211 Nov 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13568 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Latin American Symposium on Theoretical Informatics, LATIN 2022
Country/TerritoryMexico
CityGuanajuato
Period7/11/2211/11/22

Bibliographical note

Funding Information:
Keywords: String algorithms · Approximate string matching · Edit distance · Degenerate strings · Elastic-degenerate strings The work in this paper is supported in part by: the Netherlands Organisation for Scientific Research (NWO) through project OCENW.GROOT.2019.015 “Optimization for and with Machine Learning (OPTIMAL)” and Gravitation-grant NETWORKS-024.002.003; the PANGAIA and ALPACA projects that have received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreements No 872539 and 956229, respectively; and the MUR - FSE REACT EU - PON R&I 2014–2020.

Publisher Copyright:
© 2022, Springer Nature Switzerland AG.

Funding

Keywords: String algorithms · Approximate string matching · Edit distance · Degenerate strings · Elastic-degenerate strings The work in this paper is supported in part by: the Netherlands Organisation for Scientific Research (NWO) through project OCENW.GROOT.2019.015 “Optimization for and with Machine Learning (OPTIMAL)” and Gravitation-grant NETWORKS-024.002.003; the PANGAIA and ALPACA projects that have received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreements No 872539 and 956229, respectively; and the MUR - FSE REACT EU - PON R&I 2014–2020.

FundersFunder number
ALPACA
PANGAIA
Horizon 2020 Framework Programme872539, 956229
Horizon 2020 Framework Programme
Faculty of Science and Engineering, University of Manchester
Nederlandse Organisatie voor Wetenschappelijk OnderzoekNETWORKS-024.002.003
Nederlandse Organisatie voor Wetenschappelijk Onderzoek

    Keywords

    • Approximate string matching
    • Degenerate strings
    • Edit distance
    • Elastic-degenerate strings
    • String algorithms

    Fingerprint

    Dive into the research topics of 'Elastic-Degenerate String Matching with 1 Error'. Together they form a unique fingerprint.

    Cite this