TY - JOUR

T1 - Approximate pattern matching on elastic-degenerate text

AU - Bernardini, Giulia

AU - Pisanti, Nadia

AU - Pissis, Solon P.

AU - Rosone, Giovanna

PY - 2020/4/6

Y1 - 2020/4/6

N2 - An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g., pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists a non-combinatorial O(nm1.381+N)-time algorithm to solve this problem on-line [1]. In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for solving the problem under Hamming distance.

AB - An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g., pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists a non-combinatorial O(nm1.381+N)-time algorithm to solve this problem on-line [1]. In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for solving the problem under Hamming distance.

KW - Degenerate strings

KW - Elastic-degenerate strings

KW - Pan-genome

KW - Pattern matching

KW - Uncertain sequences

UR - http://www.scopus.com/inward/record.url?scp=85070355371&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070355371&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2019.08.012

DO - 10.1016/j.tcs.2019.08.012

M3 - Article

AN - SCOPUS:85070355371

VL - 812

SP - 109

EP - 122

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

ER -