TY - GEN
T1 - Pattern matching and consensus problems on weighted sequences and profiles
AU - Kociumaka, Tomasz
AU - Pissis, Solon P.
AU - Radoszewski, Jakub
PY - 2016/12/1
Y1 - 2016/12/1
N2 - We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.
AB - We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.
KW - Position weight matrix
KW - Profile matching
KW - Weighted sequence
UR - http://www.scopus.com/inward/record.url?scp=85010800001&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85010800001&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ISAAC.2016.46
DO - 10.4230/LIPIcs.ISAAC.2016.46
M3 - Conference contribution
AN - SCOPUS:85010800001
T3 - Leibniz International Proceedings in Informatics, LIPIcs
SP - 46.1-46.12
BT - 27th International Symposium on Algorithms and Computation, ISAAC 2016
A2 - Hong, Seok-Hee
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 27th International Symposium on Algorithms and Computation, ISAAC 2016
Y2 - 12 December 2016 through 14 December 2016
ER -