Minimizing the Minimizers via Alphabet Reordering

Hilde Verbeek*, Lorraine A.K. Ayad*, Grigorios Loukides*, Solon P. Pissis*

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Minimizers sampling is one of the most widely-used mechanisms for sampling strings [Roberts et al., Bioinformatics 2004]. Let S = S[1] . . . S[n] be a string over a totally ordered alphabet Σ. Further let w ≥ 2 and k ≥ 1 be two integers. The minimizer of S[i . . i + w + k − 2] is the smallest position in [i, i + w − 1] where the lexicographically smallest length-k substring of S[i . . i + w + k − 2] starts. The set of minimizers over all i ∈ [1, n − w − k + 2] is the set Mw,k(S) of the minimizers of S. We consider the following basic problem: Given S, w, and k, can we efficiently compute a total order on Σ that minimizes |Mw,k(S)|? We show that this is unlikely by proving that the problem is NP-hard for any w ≥ 3 and k ≥ 1. Our result provides theoretical justification as to why there exist no exact algorithms for minimizing the minimizers samples, while there exists a plethora of heuristics for the same purpose.

Original languageEnglish
Title of host publication35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)
Subtitle of host publication[Proceedings]
EditorsShunsuke Inenaga, Simon J. Puglisi
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages1-13
Number of pages13
ISBN (Electronic)9783959773263
DOIs
Publication statusPublished - 2024
Event35th Annual Symposium on Combinatorial Pattern Matching, CPM 2024 - Fukuoka, Japan
Duration: 25 Jun 202427 Jun 2024

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume296
ISSN (Print)1868-8969

Conference

Conference35th Annual Symposium on Combinatorial Pattern Matching, CPM 2024
Country/TerritoryJapan
CityFukuoka
Period25/06/2427/06/24

Bibliographical note

Publisher Copyright:
© Hilde Verbeek, Lorraine A.K. Ayad, Grigorios Loukides, and Solon P. Pissis.

Keywords

  • alphabet reordering
  • feedback arc set
  • minimizers
  • sequence analysis

Fingerprint

Dive into the research topics of 'Minimizing the Minimizers via Alphabet Reordering'. Together they form a unique fingerprint.

Cite this