Parameterized Text Indexing with One Wildcard

Arnab Ganguly, Wing Kai Hon, Yu An Huang, Solon P. Pissis, Rahul Shah, Sharma V. Thankachan

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character φ NotElement; Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P-1φP-2, our task is to report all positions i in T, such that the string P-1 P-2 and the string obtained by concatenating T[i,(i+|P-1|-1)] and T[(i+|P-1|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).

Original languageEnglish
Title of host publicationProceedings - DCC 2019
Subtitle of host publication2019 Data Compression Conference
EditorsMichael W. Marcellin, Joan Serra-Sagrista, James A. Storer, Ali Bilgin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages152-161
Number of pages10
ISBN (Electronic)9781728106571
DOIs
Publication statusPublished - 10 May 2019
Externally publishedYes
Event2019 Data Compression Conference, DCC 2019 - Snowbird, United States
Duration: 26 Mar 201929 Mar 2019

Publication series

NameData Compression Conference Proceedings
Volume2019-March
ISSN (Print)1068-0314

Conference

Conference2019 Data Compression Conference, DCC 2019
Country/TerritoryUnited States
CitySnowbird
Period26/03/1929/03/19

Keywords

  • Heavy Path
  • Parameterized Text Indexing
  • Succinct Data Structures
  • Suffix Tree
  • Wildcard

Fingerprint

Dive into the research topics of 'Parameterized Text Indexing with One Wildcard'. Together they form a unique fingerprint.

Cite this