Approximate pattern matching on elastic-degenerate text

Giulia Bernardini*, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g., pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists a non-combinatorial O(nm1.381+N)-time algorithm to solve this problem on-line [1]. In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for solving the problem under Hamming distance.

Original languageEnglish
Pages (from-to)109-122
Number of pages14
JournalTheoretical Computer Science
Volume812
DOIs
Publication statusPublished - 6 Apr 2020
Externally publishedYes

Funding

NP and GR are partially supported by the project MIUR -SIR CMACBioSeq (“Combinatorial methods for analysis and compression of biological sequences”) grant n. RBSI146R5L . GB, NP, and GR are partially supported by the project UniPi PRA_2017_44 (“Advanced computational methodologies for the analysis of biomedical data”). NP, SPP, and GR are partially supported by the Royal Society project IE 161274 (“Processing uncertain sequences: combinatorics and applications”).

FundersFunder number
Royal SocietyIE 161274
Ministero dell’Istruzione, dell’Università e della RicercaRBSI146R5L, UniPi PRA_2017_44

    Keywords

    • Degenerate strings
    • Elastic-degenerate strings
    • Pan-genome
    • Pattern matching
    • Uncertain sequences

    Fingerprint

    Dive into the research topics of 'Approximate pattern matching on elastic-degenerate text'. Together they form a unique fingerprint.

    Cite this