TY - JOUR

T1 - Maximal degenerate palindromes with gaps and mismatches

AU - Alzamel, Mai

AU - Hampson, Christopher

AU - Iliopoulos, Costas S.

AU - Lim, Zara

AU - Pissis, Solon

AU - Vlachakis, Dimitrios

AU - Watts, Steven

N1 - Publisher Copyright:
© 2023 The Author(s)

PY - 2023/11/2

Y1 - 2023/11/2

N2 - A degenerate symbol over an alphabet Σ is a non-empty subset of Σ, and a sequence of such symbols is a degenerate string. We investigate the exact computation of maximal degenerate palindromes with gaps and mismatches. We present an algorithm which, given a degenerate string of length n and natural number parameters g and m, efficiently detects exact maximal palindromes with a gap size ≤g, and ≤m permitted mismatches. We show that it can be done in O(k|Σ|(k+log|Σ|)+(k+g+m)n) time and O((g+m)n) space, where k represents an upper bound on the number of degenerate symbols contained in the string. Furthermore, we also show that the problem of factorisation a string into maximal degenerate palindromes with gaps and mismatches can also be done in O(k|Σ|(k+log|Σ|)+(k+g+m)n) time and O((g+m)n) space. An inverted repeat is a specific type of palindrome which refers to a nucleotide sequence followed by its reverse complement. Our results can also be used to find maximal inverted repeated sequences with gaps and mismatches, where changing the structure of palindromes to inverted repeats does not affect the overall running time. Finally we demonstrate our algorithm on several strains of SARS-CoV-2, and quantify the number of inverted repeats found with ≤0,1,2 mismatches and ≤0,10,100 gap size.

AB - A degenerate symbol over an alphabet Σ is a non-empty subset of Σ, and a sequence of such symbols is a degenerate string. We investigate the exact computation of maximal degenerate palindromes with gaps and mismatches. We present an algorithm which, given a degenerate string of length n and natural number parameters g and m, efficiently detects exact maximal palindromes with a gap size ≤g, and ≤m permitted mismatches. We show that it can be done in O(k|Σ|(k+log|Σ|)+(k+g+m)n) time and O((g+m)n) space, where k represents an upper bound on the number of degenerate symbols contained in the string. Furthermore, we also show that the problem of factorisation a string into maximal degenerate palindromes with gaps and mismatches can also be done in O(k|Σ|(k+log|Σ|)+(k+g+m)n) time and O((g+m)n) space. An inverted repeat is a specific type of palindrome which refers to a nucleotide sequence followed by its reverse complement. Our results can also be used to find maximal inverted repeated sequences with gaps and mismatches, where changing the structure of palindromes to inverted repeats does not affect the overall running time. Finally we demonstrate our algorithm on several strains of SARS-CoV-2, and quantify the number of inverted repeats found with ≤0,1,2 mismatches and ≤0,10,100 gap size.

KW - Degenerate

KW - Inverted repeat

KW - Mismatch

KW - Palindrome

KW - String factorisation

UR - http://www.scopus.com/inward/record.url?scp=85171274821&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85171274821&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2023.114182

DO - 10.1016/j.tcs.2023.114182

M3 - Article

AN - SCOPUS:85171274821

SN - 0304-3975

VL - 978

SP - 1

EP - 16

JO - Theoretical Computer Science

JF - Theoretical Computer Science

M1 - 114182

ER -