Indexing weighted sequences: Neat and efficient

Carl Barton, Tomasz Kociumaka, Chang Liu, Solon P. Pissis, Jakub Radoszewski*

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

A weighted sequence is a sequence of probability mass functions over a finite alphabet. A weighted index is a data structure constructed for a weighted sequence and a threshold [Formula presented] that, given a string pattern, reports all positions where it occurs in the weighted sequence with probability at least [Formula presented]. We present an O(nz)-time construction of an O(nz)-sized weighted index for a weighted sequence of length n that answers queries in optimal time. The previous solution by Amir et al. (2008) required O(nz2log⁡z) time and space. Our main tools are a construction of a family of ⌊z⌋ strings that carries the information about all the strings that occur in a weighted sequence and a more straightforward solution to so-called property indexing. We present applications of our weighted index, in particular in approximate and general scenarios that were introduced by Biswas et al. (2016), and provide its implementation.

Original languageEnglish
Article number104462
JournalInformation and Computation
Volume270
DOIs
Publication statusPublished - Feb 2020
Externally publishedYes

Keywords

  • Position weight matrix (PWM)
  • Property indexing
  • Suffix tree
  • Text indexing
  • Weighted sequence

Fingerprint

Dive into the research topics of 'Indexing weighted sequences: Neat and efficient'. Together they form a unique fingerprint.

Cite this