TY - GEN

T1 - Parameterized Text Indexing with One Wildcard

AU - Ganguly, Arnab

AU - Hon, Wing Kai

AU - Huang, Yu An

AU - Pissis, Solon P.

AU - Shah, Rahul

AU - Thankachan, Sharma V.

PY - 2019/5/10

Y1 - 2019/5/10

N2 - Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character φ NotElement; Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P-1φP-2, our task is to report all positions i in T, such that the string P-1 P-2 and the string obtained by concatenating T[i,(i+|P-1|-1)] and T[(i+|P-1|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).

AB - Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character φ NotElement; Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P-1φP-2, our task is to report all positions i in T, such that the string P-1 P-2 and the string obtained by concatenating T[i,(i+|P-1|-1)] and T[(i+|P-1|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).

KW - Heavy Path

KW - Parameterized Text Indexing

KW - Succinct Data Structures

KW - Suffix Tree

KW - Wildcard

UR - http://www.scopus.com/inward/record.url?scp=85066300162&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066300162&partnerID=8YFLogxK

U2 - 10.1109/DCC.2019.00023

DO - 10.1109/DCC.2019.00023

M3 - Conference contribution

AN - SCOPUS:85066300162

T3 - Data Compression Conference Proceedings

SP - 152

EP - 161

BT - Proceedings - DCC 2019

A2 - Marcellin, Michael W.

A2 - Serra-Sagrista, Joan

A2 - Storer, James A.

A2 - Bilgin, Ali

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 Data Compression Conference, DCC 2019

Y2 - 26 March 2019 through 29 March 2019

ER -