Property suffix array with applications

Panagiotis Charalampopoulos, Costas S. Iliopoulos, Chang Liu, Solon P. Pissis*

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e. a set of Π -valid intervals over x. A suffix-tree-like index over these valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1z, we say that a string p of length m matches a weighted sequence X of length n at starting position i if the product of probabilities of the letters of p at positions i, …, i+ m- 1 in X is at least 1z. Our algorithm for building the PSA can be directly applied to build an O(nz) -sized suffix-array-like index over X in time and space O(nz).

Original languageEnglish
Title of host publicationLATIN 2018
Subtitle of host publicationTheoretical Informatics - 13th Latin American Symposium, Proceedings
EditorsMiguel A. Mosteiro, Michael A. Bender, Martin Farach-Colton
PublisherSpringer Verlag
Pages290-302
Number of pages13
ISBN (Print)9783319774039
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event13th International Symposium on Latin American Theoretical Informatics, LATIN 2018 - Buenos Aires, Argentina
Duration: 16 Apr 201819 Apr 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10807 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Symposium on Latin American Theoretical Informatics, LATIN 2018
Country/TerritoryArgentina
CityBuenos Aires
Period16/04/1819/04/18

Funding

P. Charalampopoulos—Supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King’s College London and by the A.G. Leventis Foundation.

Fingerprint

Dive into the research topics of 'Property suffix array with applications'. Together they form a unique fingerprint.

Cite this