Abstract
Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character φ NotElement; Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P-1φP-2, our task is to report all positions i in T, such that the string P-1 P-2 and the string obtained by concatenating T[i,(i+|P-1|-1)] and T[(i+|P-1|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).
| Original language | English |
|---|---|
| Title of host publication | Proceedings - DCC 2019 |
| Subtitle of host publication | 2019 Data Compression Conference |
| Editors | Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer, Ali Bilgin |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 152-161 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781728106571 |
| DOIs | |
| Publication status | Published - 10 May 2019 |
| Externally published | Yes |
| Event | 2019 Data Compression Conference, DCC 2019 - Snowbird, United States Duration: 26 Mar 2019 → 29 Mar 2019 |
Publication series
| Name | Data Compression Conference Proceedings |
|---|---|
| Volume | 2019-March |
| ISSN (Print) | 1068-0314 |
Conference
| Conference | 2019 Data Compression Conference, DCC 2019 |
|---|---|
| Country/Territory | United States |
| City | Snowbird |
| Period | 26/03/19 → 29/03/19 |
Funding
∗Dept. of CS, University of Wisconsin - Whitewater, USA [email protected] ¶Dept. of CS, National Tsing Hua University, Taiwan [email protected] †Dept. of CS, National Tsing Hua University, Taiwan [email protected] ††CWI, Amsterdam, The Netherlands [email protected] of CS, Louisiana State University, USA and National Science Foundation, USA [email protected],[email protected] ‡Dept. of CS, University of Central Florida, USA [email protected]
Keywords
- Heavy Path
- Parameterized Text Indexing
- Succinct Data Structures
- Suffix Tree
- Wildcard