Abstract
Pattern matching on a set of similar texts has received much attention, especially recently, mainly due to its application in cataloguing human genetic variation. In particular, many different algorithms have been proposed for the off-line version of this problem; that is, constructing a compressed index for a set of similar texts in order to answer pattern matching queries efficiently. However, the on-line, more fundamental, version of this problem is a rather undeveloped topic. Solutions to the on-line version can be beneficial for a number of reasons; for instance, efficient on-line solutions can be used in combination with partial indexes as practical trade-offs. We make here an attempt to close this gap via proposing two efficient algorithms for this problem. Notably, one of the algorithms requires time linear in the size of the texts' representation, for short patterns. Furthermore, experimental results confirm our theoretical findings in practical terms.
Original language | English |
---|---|
Title of host publication | 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 |
Editors | Jakub Radoszewski, Juha Karkkainen, Jakub Radoszewski, Wojciech Rytter |
Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
ISBN (Electronic) | 9783959770392 |
DOIs | |
Publication status | Published - 1 Jul 2017 |
Externally published | Yes |
Event | 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 - Warsaw, Poland Duration: 4 Jul 2017 → 6 Jul 2017 |
Publication series
Name | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|
Volume | 78 |
ISSN (Print) | 1868-8969 |
Conference
Conference | 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 |
---|---|
Country/Territory | Poland |
City | Warsaw |
Period | 4/07/17 → 6/07/17 |
Funding
Roberto Grossi, Nadia Pisanti, and Giovanna Rosone are partially supported by the project UniPi PRA-2017-44 ("Advanced computational methodologies for the analysis of biomedical data"). Costas S. Iliopoulos is partially supported by the Onassis Foundation. Ahmad Retha is supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King's College London. Giovanna Rosone is partially supported by the project MIUR-SIR CMACBioSeq ("Combinatorial methods for analysis and compression of biological sequences") grant n. RBSI146R5L. Fatima Vayani is supported by an EPSRC Grant (Doctoral Training Grant #EP/M506357/1).
Keywords
- Degenerate strings
- Elastic-degenerate strings
- On-line algorithms
- Pattern matching
- String algorithms