TY - JOUR

T1 - Faster algorithms for 1-mappability of a sequence

AU - Alzamel, Mai

AU - Charalampopoulos, Panagiotis

AU - Iliopoulos, Costas S.

AU - Pissis, Solon P.

AU - Radoszewski, Jakub

AU - Sung, Wing Kin

PY - 2020/4/6

Y1 - 2020/4/6

N2 - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k=1. There exists an algorithm to solve this problem for k=1 requiring time O(mnlogn/loglogn) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(nlognloglogn), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m=Ω(logσn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m=Ω(klogσn), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.

AB - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k=1. There exists an algorithm to solve this problem for k=1 requiring time O(mnlogn/loglogn) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(nlognloglogn), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m=Ω(logσn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m=Ω(klogσn), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.

KW - Algorithms on strings

KW - Hamming distance

KW - Sequence mappability

UR - http://www.scopus.com/inward/record.url?scp=85067179715&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067179715&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2019.04.026

DO - 10.1016/j.tcs.2019.04.026

M3 - Article

AN - SCOPUS:85067179715

VL - 812

SP - 2

EP - 12

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

ER -