Abstract
In the all-pairs suffix-prefix (APSP) problem [Gusfield et al., Inf. Process. Lett. 1992], we are given a dictionary R of r strings, S1, . . ., Sr, of total length n, and we are asked to find the length SPLi,j of the longest string that is both a suffix of Si and a prefix of Sj, for all i, j ∈ [1 . . r]. APSP is a classic problem in string algorithms with applications in bioinformatics, especially in sequence assembly. Since r = |R| is typically very large in real-world applications, considering all r2 pairs of strings explicitly is prohibitive. This is when the data structure variant of APSP makes sense; in the same spirit as distance oracles computing shortest paths between any two vertices given online. We show how to quickly locate k-approximate matches (under the Hamming or the edit distance) in R using a version of the k-errata tree [Cole et al., STOC 2004] that we introduce. Let SPLki,j be the length of the longest suffix of Si that is at distance at most k from a prefix of Sj. In particular, for any k = O(1), we show an O(nlogk n)-sized data structure to support the following queries: One-to-Onek(i, j): output SPLki,j in O(logk nlog log n) time. Reportk(i, d): output all j ∈ [1 . . r], such that SPLki,j ≥ d, in O(logk n(log n/log log n + output)) time, where output denotes the size of the output. In fact, our algorithms work for any value of k not just for k = O(1), but the formulas bounding the complexities get much more complicated for larger values of k.
Original language | English |
---|---|
Title of host publication | 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024) |
Subtitle of host publication | [Proceedings] |
Editors | Rastislav Kralovic, Antonin Kucera |
Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
Pages | 1-18 |
Number of pages | 18 |
ISBN (Electronic) | 9783959773355 |
DOIs | |
Publication status | Published - Aug 2024 |
Event | 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024 - Bratislava, Slovakia Duration: 26 Aug 2024 → 30 Aug 2024 |
Publication series
Name | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|
Volume | 306 |
ISSN (Print) | 1868-8969 |
Conference
Conference | 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024 |
---|---|
Country/Territory | Slovakia |
City | Bratislava |
Period | 26/08/24 → 30/08/24 |
Bibliographical note
Publisher Copyright:© Wiktor Zuba, Grigorios Loukides, Solon P. Pissis, and Sharma V. Thankachan.
Keywords
- all-pairs suffix-prefix
- k-errata tree
- suffix tree
- suffix-prefix queries