Abstract
A weighted sequence is a sequence of probability mass functions over a finite alphabet. A weighted index is a data structure constructed for a weighted sequence and a threshold [Formula presented] that, given a string pattern, reports all positions where it occurs in the weighted sequence with probability at least [Formula presented]. We present an O(nz)-time construction of an O(nz)-sized weighted index for a weighted sequence of length n that answers queries in optimal time. The previous solution by Amir et al. (2008) required O(nz2logz) time and space. Our main tools are a construction of a family of ⌊z⌋ strings that carries the information about all the strings that occur in a weighted sequence and a more straightforward solution to so-called property indexing. We present applications of our weighted index, in particular in approximate and general scenarios that were introduced by Biswas et al. (2016), and provide its implementation.
Original language | English |
---|---|
Article number | 104462 |
Journal | Information and Computation |
Volume | 270 |
DOIs | |
Publication status | Published - Feb 2020 |
Externally published | Yes |
Keywords
- Position weight matrix (PWM)
- Property indexing
- Suffix tree
- Text indexing
- Weighted sequence