Abstract
A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings (Formula presented) and (Formula presented) and a threshold (Formula presented) on probability, and we are asked to compute the shortest (standard) string S such that both (Formula presented) and (Formula presented) match subsequences of S (not necessarily the same) with probability at least (Formula presented). Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold (Formula presented), are represented by their logarithms (encoded in binary). We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in (Formula presented) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in (Formula presented) time or in (Formula presented) with time, where the (Formula presented) notation suppresses factors polynomial with respect to the instance size (with numeric values encoded in binary), unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in (Formula presented) time, for any function f(z), unless (Formula presented).
| Original language | English |
|---|---|
| Title of host publication | String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019, Proceedings |
| Editors | Nieves R. Brisaboa, Simon J. Puglisi |
| Publisher | Springer |
| Pages | 221-238 |
| Number of pages | 18 |
| ISBN (Print) | 9783030326852 |
| DOIs | |
| Publication status | Published - 1 Jan 2019 |
| Externally published | Yes |
| Event | 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 - Segovia, Spain Duration: 7 Oct 2019 → 9 Oct 2019 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 11811 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 |
|---|---|
| Country/Territory | Spain |
| City | Segovia |
| Period | 7/10/19 → 9/10/19 |
Funding
Tomasz Kociumaka was supported by ISF grants no. 824/17 and 1278/16 and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064). Jakub Radoszewski and Juliusz Straszyński were supported by the “Algorithms for text processing with errors and uncertainties” project carried out within the HOMING program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.
Fingerprint
Dive into the research topics of 'Weighted Shortest Common Supersequence Problem Revisited'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver