TY - GEN

T1 - Maximal Motif discovery in a sliding window

AU - Iliopoulos, Costas S.

AU - Mohamed, Manal

AU - Pissis, Solon P.

AU - Vayani, Fatima

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Motifs are relatively short sequences that are biologically significant, and their discovery in molecular sequences is a well-researched subject. A don’t care is a special letter that matches every letter in the alphabet. Formally, a motif is a sequence of letters of the alphabet and don’t care letters. A motif (Formula presented) that occurs at least k times in a sequence is maximal if it cannot be extended (to the left or right) nor can it be specialised (that is, its d’≤d don’t cares cannot be replaced with letters from the alphabet) without reducing its number of occurrences. Here we present a new dynamic data structure, and the first on-line algorithm, to discover all maximal motifs in a sliding window of length l on a sequence x of length n in (Formula presented) time, where w is the size of the machine word and DIFFi i-1 is the symmetric difference of the sets of occurrences of maximal motifs at x[i-l..i-1] and at x[i-l+1..i].

AB - Motifs are relatively short sequences that are biologically significant, and their discovery in molecular sequences is a well-researched subject. A don’t care is a special letter that matches every letter in the alphabet. Formally, a motif is a sequence of letters of the alphabet and don’t care letters. A motif (Formula presented) that occurs at least k times in a sequence is maximal if it cannot be extended (to the left or right) nor can it be specialised (that is, its d’≤d don’t cares cannot be replaced with letters from the alphabet) without reducing its number of occurrences. Here we present a new dynamic data structure, and the first on-line algorithm, to discover all maximal motifs in a sliding window of length l on a sequence x of length n in (Formula presented) time, where w is the size of the machine word and DIFFi i-1 is the symmetric difference of the sets of occurrences of maximal motifs at x[i-l..i-1] and at x[i-l+1..i].

KW - Genome analysis

KW - Motif discovery

KW - Sequence motifs

UR - http://www.scopus.com/inward/record.url?scp=85054802969&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054802969&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00479-8_16

DO - 10.1007/978-3-030-00479-8_16

M3 - Conference contribution

AN - SCOPUS:85054802969

SN - 9783030004781

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 191

EP - 205

BT - String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings

A2 - Gagie, Travis

A2 - Moffat, Alistair

A2 - Navarro, Gonzalo

A2 - Cuadros-Vargas, Ernesto

PB - Springer Verlag

T2 - 25th International Symposium on String Processing and Information Retrieval, SPIRE 2018

Y2 - 9 October 2018 through 11 October 2018

ER -