TY - GEN

T1 - Property suffix array with applications

AU - Charalampopoulos, Panagiotis

AU - Iliopoulos, Costas S.

AU - Liu, Chang

AU - Pissis, Solon P.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e. a set of Π -valid intervals over x. A suffix-tree-like index over these valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1z, we say that a string p of length m matches a weighted sequence X of length n at starting position i if the product of probabilities of the letters of p at positions i, …, i+ m- 1 in X is at least 1z. Our algorithm for building the PSA can be directly applied to build an O(nz) -sized suffix-array-like index over X in time and space O(nz).

AB - The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e. a set of Π -valid intervals over x. A suffix-tree-like index over these valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1z, we say that a string p of length m matches a weighted sequence X of length n at starting position i if the product of probabilities of the letters of p at positions i, …, i+ m- 1 in X is at least 1z. Our algorithm for building the PSA can be directly applied to build an O(nz) -sized suffix-array-like index over X in time and space O(nz).

UR - http://www.scopus.com/inward/record.url?scp=85045398695&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045398695&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-77404-6_22

DO - 10.1007/978-3-319-77404-6_22

M3 - Conference contribution

AN - SCOPUS:85045398695

SN - 9783319774039

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 290

EP - 302

BT - LATIN 2018

A2 - Mosteiro, Miguel A.

A2 - Bender, Michael A.

A2 - Farach-Colton, Martin

PB - Springer Verlag

T2 - 13th International Symposium on Latin American Theoretical Informatics, LATIN 2018

Y2 - 16 April 2018 through 19 April 2018

ER -