TY - JOUR

T1 - Linear-time computation of prefix table for weighted strings & applications

AU - Barton, Carl

AU - Liu, Chang

AU - Pissis, Solon P.

PY - 2016/12/20

Y1 - 2016/12/20

N2 - The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1/z, defined as the minimal probability of occurrence of factors in x, we present an O(n)-time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

AB - The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1/z, defined as the minimal probability of occurrence of factors in x, we present an O(n)-time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

KW - Algorithms on strings

KW - Uncertain sequences

KW - Weighted strings

UR - http://www.scopus.com/inward/record.url?scp=84964883518&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964883518&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2016.04.029

DO - 10.1016/j.tcs.2016.04.029

M3 - Article

AN - SCOPUS:84964883518

VL - 656

SP - 160

EP - 172

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

ER -