Abstract
Kubica et al. [33] and Kim et al. [29] introduced order-preserving pattern matching: for a given text the goal is to find its factors having the same 'shape' as a given pattern. Known results include a linear-time algorithm for this problem (in case of polynomially-bounded alphabet) and a generalization to multiple patterns. We propose an index that enables order-preserving pattern matching queries in time proportional to pattern length. The index can be constructed in O(nlog log n) expected time or in O(nlog2 log n/log log log n) worst-case time. It is an incomplete order-preserving suffix tree which may miss a single edge label at each branching node. For most applications such incomplete suffix trees provide the same functional power as the complete ones. We show a number of their applications, including computation of longest common factors, longest previously occurring factors and squares in a string in the order-preserving setting. We also give an O(nlog n)-time algorithm constructing complete order-preserving suffix trees.
Original language | English |
---|---|
Pages (from-to) | 122-135 |
Number of pages | 14 |
Journal | Theoretical Computer Science |
Volume | 638 |
DOIs | |
Publication status | Published - 25 Jul 2016 |
Externally published | Yes |
Funding
Tomasz Kociumaka is supported by Polish budget funds for science in 2013–2017 as a research project under the ‘Diamond Grant’ program ( Ministry of Science and Higher Education , Republic of Poland, grant number DI2012 01794 ). Jakub Radoszewski receives financial support of Foundation for Polish Science and is supported by the Polish Ministry of Science and Higher Education under the ‘Iuventus Plus’ program in 2015–2016 grant no. 0392/IP3/2015/73 . Wojciech Rytter is supported by grant no. NCN2014/13/B/ST6/00770 of the National Science Centre .
Keywords
- Order-preserving indexing
- Order-preserving matching
- Suffix tree