TY - JOUR

T1 - Extending alignments with k-mismatches and ℓ-gaps

AU - Barton, Carl

AU - Iliopoulos, Costas S.

AU - Lee, Inbok

AU - Mouchard, Laurent

AU - Park, Kunsoo

AU - Pissis, Solon P.

PY - 2014/3/13

Y1 - 2014/3/13

N2 - Recently, the problem of extending an alignment with k-mismatches and a single gap for pairwise sequence alignment was introduced (Flouri et al., 2011). The authors considered the problem of extending an alignment under the Hamming distance model by also allowing the insertion of a single gap; and presented a Θ(mβ)-time algorithm to solve it, where m is the length of the shortest sequence to be extended, and β is the maximum allowed length of the single gap. Very recently, it was shown (Flouri et al., 2012) that this problem is strongly and directly motivated by the next-generation re-sequencing application: aligning tens of millions of short DNA sequences against a reference genome. In this article, we consider an extension of this problem: extending an alignment with k-mismatches and two gaps; and present a Θ(mβ)-time algorithm to solve it. This extension is proved to be fundamental in the next-generation re-sequencing application (Alachiotis et al., 2012). In addition, we present a generalisation of our solution to solve the problem of extending an alignment with k-mismatches and ℓ-gaps in time Θ(mβℓ). The presented solutions work provided that all gaps in the alignment must occur in one of the two sequences.

AB - Recently, the problem of extending an alignment with k-mismatches and a single gap for pairwise sequence alignment was introduced (Flouri et al., 2011). The authors considered the problem of extending an alignment under the Hamming distance model by also allowing the insertion of a single gap; and presented a Θ(mβ)-time algorithm to solve it, where m is the length of the shortest sequence to be extended, and β is the maximum allowed length of the single gap. Very recently, it was shown (Flouri et al., 2012) that this problem is strongly and directly motivated by the next-generation re-sequencing application: aligning tens of millions of short DNA sequences against a reference genome. In this article, we consider an extension of this problem: extending an alignment with k-mismatches and two gaps; and present a Θ(mβ)-time algorithm to solve it. This extension is proved to be fundamental in the next-generation re-sequencing application (Alachiotis et al., 2012). In addition, we present a generalisation of our solution to solve the problem of extending an alignment with k-mismatches and ℓ-gaps in time Θ(mβℓ). The presented solutions work provided that all gaps in the alignment must occur in one of the two sequences.

KW - Algorithms on strings

KW - Dynamic programming

KW - Pairwise sequence alignment

UR - http://www.scopus.com/inward/record.url?scp=84895929964&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895929964&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2013.06.012

DO - 10.1016/j.tcs.2013.06.012

M3 - Article

AN - SCOPUS:84895929964

VL - 525

SP - 80

EP - 88

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

ER -