TY - GEN
T1 - Weighted Shortest Common Supersequence Problem Revisited
AU - Charalampopoulos, Panagiotis
AU - Kociumaka, Tomasz
AU - Pissis, Solon P.
AU - Radoszewski, Jakub
AU - Rytter, Wojciech
AU - Straszyński, Juliusz
AU - Waleń, Tomasz
AU - Zuba, Wiktor
PY - 2019/1/1
Y1 - 2019/1/1
N2 - A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings (Formula presented) and (Formula presented) and a threshold (Formula presented) on probability, and we are asked to compute the shortest (standard) string S such that both (Formula presented) and (Formula presented) match subsequences of S (not necessarily the same) with probability at least (Formula presented). Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold (Formula presented), are represented by their logarithms (encoded in binary). We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in (Formula presented) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in (Formula presented) time or in (Formula presented) with time, where the (Formula presented) notation suppresses factors polynomial with respect to the instance size (with numeric values encoded in binary), unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in (Formula presented) time, for any function f(z), unless (Formula presented).
AB - A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings (Formula presented) and (Formula presented) and a threshold (Formula presented) on probability, and we are asked to compute the shortest (standard) string S such that both (Formula presented) and (Formula presented) match subsequences of S (not necessarily the same) with probability at least (Formula presented). Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold (Formula presented), are represented by their logarithms (encoded in binary). We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in (Formula presented) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in (Formula presented) time or in (Formula presented) with time, where the (Formula presented) notation suppresses factors polynomial with respect to the instance size (with numeric values encoded in binary), unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in (Formula presented) time, for any function f(z), unless (Formula presented).
UR - http://www.scopus.com/inward/record.url?scp=85075699973&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075699973&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-32686-9_16
DO - 10.1007/978-3-030-32686-9_16
M3 - Conference contribution
AN - SCOPUS:85075699973
SN - 9783030326852
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 221
EP - 238
BT - String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019, Proceedings
A2 - Brisaboa, Nieves R.
A2 - Puglisi, Simon J.
PB - Springer
T2 - 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019
Y2 - 7 October 2019 through 9 October 2019
ER -