Unary Words Have the Smallest Levenshtein k-Neighbourhoods

Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Tomasz Walen, Wiktor Zuba

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

The edit distance (a.k.a. the Levenshtein distance) between two words is defined as the minimum number of insertions, deletions or substitutions of letters needed to transform one word into another. The Levenshtein k-neighbourhood of a word w is the set of words that are at edit distance at most k from w. This is perhaps the most important concept underlying BLAST, a widely-used tool for comparing biological sequences. A natural combinatorial question is to ask for upper and lower bounds on the size of this set. The answer to this question has important algorithmic implications as well. Myers notes that "such bounds would give a tighter characterisation of the running time of the algorithm" behind BLAST. We show that the size of the Levenshtein k-neighbourhood of any word of length n over an arbitrary alphabet is not smaller than the size of the Levenshtein k-neighbourhood of a unary word of length n, thus providing a tight lower bound on the size of the Levenshtein k-neighbourhood. We remark that this result was posed as a conjecture by Dufresne at WCTA 2019. 

Original languageEnglish
Title of host publication31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
EditorsInge Li Gortz, Oren Weimann
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages1-12
Number of pages12
ISBN (Electronic)9783959771498
DOIs
Publication statusPublished - 2020
Event31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - Copenhagen, Denmark
Duration: 17 Jun 202019 Jun 2020

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume161
ISSN (Print)1868-8969

Conference

Conference31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
Country/TerritoryDenmark
CityCopenhagen
Period17/06/2019/06/20

Funding

Funding This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 872539. Panagiotis Charalampopoulos: Supported by ERC grant TOTAL under the European Union’s Horizon 2020 Research and Innovation Programme (agreement no. 677651). Jakub Radoszewski: Supported by the Polish National Science Center, grant number 2018/31/D/ST6/03991. Tomasz Waleń: Supported by the Polish National Science Center, grant number 2018/31/D/ST6/03991. Wiktor Zuba: Supported by the Polish National Science Center, grant number 2018/31/D/ST6/03991.

FundersFunder number
Polish National Science Center2018/31/D/ST6/03991
Horizon 2020 Framework Programme677651
H2020 Marie Skłodowska-Curie Actions872539
European Research Council
Total

    Keywords

    • Combinatorics on words
    • Edit distance
    • Levenshtein distance

    Fingerprint

    Dive into the research topics of 'Unary Words Have the Smallest Levenshtein k-Neighbourhoods'. Together they form a unique fingerprint.

    Cite this