## Abstract

In the all-pairs suffix-prefix (APSP) problem, we are given a dictionary R of k strings, S1, . . ., S_{k}, of total length n, and we are asked to find the length SPLi,j of the longest string that is both a suffix of Si and a prefix of Sj, for all i, j ∈ [1, k]. APSP is a classic problem in string algorithms with many applications in bioinformatics. When all strings of the dictionary are over an integer alphabet of size σ ≤ n^{O}^{(1)}, APSP can be solved in the optimal O(n + k^{2}) time with the use of the generalized suffix tree of the dictionary [Gusfield et al., Inf. Process. Lett. 1992]. In many bioinformatics applications, such as in sequence assembly, the size k of dictionary R is very large. In particular, k^{2} usually dominates n, and thus the k^{2} factor is the bottleneck both in the time and in the space complexity of such applications. We thus initiate a holistic study on several data structure variants of APSP. In particular, we consider the following types of queries: One-to-One(i, j): output SPLi,j. One-to-All(i): output SPLi,j for every j ∈ [1, k]. Report(i, ℓ): output all distinct j ∈ [1, k] such that SPLi,j ≥ ℓ, where ℓ ≥ 0 is an integer. Count(i, ℓ): output the number of distinct j ∈ [1, k] such that SPLi,j ≥ ℓ, where ℓ ≥ 0 is an integer. Top(i, K): output K distinct j ∈ [1, k] with the highest values of SPLi,j breaking ties arbitrarily. We assume the standard word RAM model of computation with word size w = Ω(log n) and an integer alphabet of size σ ≤ n^{O}^{(1)}. We show the following upper bounds: Query Space (words) Query time Note One-to-One(i, j) O(n) O(log log k) Theorem 11 One-to-All(i) O(n) O(k) Theorem 14 Report(i, ℓ) O(n) O(log n/log log n + output) Theorem 19(i) Count(i, ℓ) O(n) O(log n/log log n) Theorem 19(ii) Top(i, K) O(n) O(log^{2} n/log log n + K) Theorem 22 We also present efficient algorithms for constructing these data structures.

Original language | English |
---|---|

Title of host publication | 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) |

Subtitle of host publication | [Proceedings] |

Editors | Laurent Bulteau, Zsuzsanna Liptak |

Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |

Pages | 1-20 |

Number of pages | 20 |

ISBN (Print) | 9783959772761 |

DOIs | |

Publication status | Published - 2023 |

Event | 34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023 - Marne-la-Vallee, France Duration: 26 Jun 2023 → 28 Jun 2023 |

### Publication series

Name | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|

Volume | 259 |

ISSN (Print) | 1868-8969 |

### Conference

Conference | 34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023 |
---|---|

Country/Territory | France |

City | Marne-la-Vallee |

Period | 26/06/23 → 28/06/23 |

### Bibliographical note

Funding Information:Funding This work is supported in part by the Royal Society grant IES\R3\193209. Solon P. Pissis: Supported in part by the PANGAIA and ALPACA projects that have received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No 872539 and 956229, respectively. Sharma V. Thankachan: Supported by the U.S. National Science Foundation (NSF) grant CCF-2146003. Wiktor Zuba: Supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003.

Publisher Copyright:

© 2023 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.

## Keywords

- all-pairs suffix-prefix
- internal pattern matching
- suffix-prefix queries