Degenerate string comparison and applications

Mai Alzamel, Lorraine A.K. Ayad, Giulia Bernardini, Roberto Grossi, Costas S. Iliopoulos, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

A generalised degenerate string (GD string) Š is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length ki but this length can vary between different sets. We denote the sum of these lengths k0, k1, . . . , kn-1 by W. This type of uncertain sequence can represent, for example, a gapless multiple sequence alignment of width W in a compact form. Our first result in this paper is an O(N+M)-time algorithm for deciding whether the intersection of two GD strings of total sizes N and M, respectively, over an integer alphabet, is non-empty. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in only linear space. A similar result can be obtained by employing an automata-based approach but its cost is alphabet-dependent. We then apply our string comparison algorithm to compute palindromes in GD strings. We present an O(min{W, n2}N)-time algorithm for computing all palindromes in Š. Furthermore, we show a similar conditional lower bound for computing maximal palindromes in Š. Finally, proof-of-concept experimental results are presented using real protein datasets.

Original languageEnglish
Title of host publication18th International Workshop on Algorithms in Bioinformatics, WABI 2018
EditorsLaxmi Parida, Esko Ukkonen
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Print)9783959770828
DOIs
Publication statusPublished - 1 Aug 2018
Externally publishedYes
Event18th International Workshop on Algorithms in Bioinformatics, WABI 2018 - Helsinki, Finland
Duration: 20 Aug 201822 Aug 2018

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume113
ISSN (Print)1868-8969

Conference

Conference18th International Workshop on Algorithms in Bioinformatics, WABI 2018
Country/TerritoryFinland
CityHelsinki
Period20/08/1822/08/18

Funding

Partially supported by the project UNIPI PRA_2017_44 “Advanced computational methodologies for the analysis of biomedical data”. 2 Partially supported by the project UNIPI PRA_2017_44 “Advanced computational methodologies for the analysis of biomedical data”. 3 Partially supported by the project MIUR-SIR CMACBioSeq “Combinatorial methods for analysis and compression of biological sequences” grant n. RBSI146R5L and the project UNIPI PRA_2017_44 “ Advanced computational methodologies for the analysis of biomedical data”. 4 Partially supported by the Royal Society project IE 161274 “Processing uncertain sequences: combinat-orics and applications”. 5 Partially supported by the project MIUR-SIR CMACBioSeq “Combinatorial methods for analysis and compression of biological sequences” grant n. RBSI146R5L, the Royal Society project IE 161274

Keywords

  • Degenerate strings
  • Elastic-degenerate strings
  • Generalised degenerate strings
  • Palindromes
  • String comparison

Fingerprint

Dive into the research topics of 'Degenerate string comparison and applications'. Together they form a unique fingerprint.

Cite this