Abstract
An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~ (nmω-1) + O(N) -time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~ (· ) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+ kN) time under edit distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k= 1, the existing algorithm runs in Ω(mN) time in the worst case. Here we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+ N) log m) or O(nm3+ N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm) -time algorithm. Our algorithms rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or range emptiness), which we show how to solve efficiently.
Original language | English |
---|---|
Title of host publication | LATIN 2022: Theoretical Informatics |
Subtitle of host publication | 15th Latin American Symposium, Guanajuato, Mexico, November 7–11, 2022, Proceedings |
Editors | Armando Castañeda, Francisco Rodríguez-Henríquez |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 20-37 |
Number of pages | 18 |
ISBN (Electronic) | 9783031206245 |
ISBN (Print) | 9783031206238 |
DOIs | |
Publication status | Published - 2022 |
Event | 15th Latin American Symposium on Theoretical Informatics, LATIN 2022 - Guanajuato, Mexico Duration: 7 Nov 2022 → 11 Nov 2022 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13568 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 15th Latin American Symposium on Theoretical Informatics, LATIN 2022 |
---|---|
Country/Territory | Mexico |
City | Guanajuato |
Period | 7/11/22 → 11/11/22 |
Bibliographical note
Funding Information:Keywords: String algorithms · Approximate string matching · Edit distance · Degenerate strings · Elastic-degenerate strings The work in this paper is supported in part by: the Netherlands Organisation for Scientific Research (NWO) through project OCENW.GROOT.2019.015 “Optimization for and with Machine Learning (OPTIMAL)” and Gravitation-grant NETWORKS-024.002.003; the PANGAIA and ALPACA projects that have received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreements No 872539 and 956229, respectively; and the MUR - FSE REACT EU - PON R&I 2014–2020.
Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
Funding
Keywords: String algorithms · Approximate string matching · Edit distance · Degenerate strings · Elastic-degenerate strings The work in this paper is supported in part by: the Netherlands Organisation for Scientific Research (NWO) through project OCENW.GROOT.2019.015 “Optimization for and with Machine Learning (OPTIMAL)” and Gravitation-grant NETWORKS-024.002.003; the PANGAIA and ALPACA projects that have received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreements No 872539 and 956229, respectively; and the MUR - FSE REACT EU - PON R&I 2014–2020.
Funders | Funder number |
---|---|
ALPACA | |
PANGAIA | |
Horizon 2020 Framework Programme | 872539, 956229 |
Horizon 2020 Framework Programme | |
Faculty of Science and Engineering, University of Manchester | |
Nederlandse Organisatie voor Wetenschappelijk Onderzoek | NETWORKS-024.002.003 |
Nederlandse Organisatie voor Wetenschappelijk Onderzoek |
Keywords
- Approximate string matching
- Degenerate strings
- Edit distance
- Elastic-degenerate strings
- String algorithms