Abstract
The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e. a set of Π -valid intervals over x. A suffix-tree-like index over these valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1z, we say that a string p of length m matches a weighted sequence X of length n at starting position i if the product of probabilities of the letters of p at positions i, …, i+ m- 1 in X is at least 1z. Our algorithm for building the PSA can be directly applied to build an O(nz) -sized suffix-array-like index over X in time and space O(nz).
Original language | English |
---|---|
Title of host publication | LATIN 2018 |
Subtitle of host publication | Theoretical Informatics - 13th Latin American Symposium, Proceedings |
Editors | Miguel A. Mosteiro, Michael A. Bender, Martin Farach-Colton |
Publisher | Springer Verlag |
Pages | 290-302 |
Number of pages | 13 |
ISBN (Print) | 9783319774039 |
DOIs | |
Publication status | Published - 1 Jan 2018 |
Externally published | Yes |
Event | 13th International Symposium on Latin American Theoretical Informatics, LATIN 2018 - Buenos Aires, Argentina Duration: 16 Apr 2018 → 19 Apr 2018 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10807 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 13th International Symposium on Latin American Theoretical Informatics, LATIN 2018 |
---|---|
Country/Territory | Argentina |
City | Buenos Aires |
Period | 16/04/18 → 19/04/18 |
Funding
P. Charalampopoulos—Supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King’s College London and by the A.G. Leventis Foundation.