TY - JOUR
T1 - Global and local sequence alignment with a bounded number of gaps
AU - Barton, Carl
AU - Flouri, Tomáš
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
PY - 2015/5/31
Y1 - 2015/5/31
N2 - Pairwise sequence alignment techniques have gained renewed interest in recent years, primarily due to their applications in re-sequencing-the assembly of a genome directed by a reference sequence.In this article, we show that adding the flexibility of bounding the number of gaps inserted in an alignment strengthens the classical sequence alignment scheme of scoring matrices and affine gap penalty scores. We present GapsMis, an algorithm for pairwise global sequence alignment with a variable, but bounded, number of gaps. It is based on computing a variant of the traditional dynamic programming matrix for global sequence alignment. We also present GapsMis-L, the analogous algorithm for pairwise local sequence alignment with a variable, but bounded, number of gaps.To test the accuracy of GapsMis and GapsMis-L we performed millions of pairwise sequence alignments under realistic conditions, based on the properties of real full-length genomes. The results show that GapsMis and GapsMis-L can increase the accuracy of extending short-read alignments compared to the traditional approaches. The importance of our contribution is underlined by the fact that the provided algorithms may be seamlessly integrated into any biological pipeline. The open-source code of our implementation is freely available at http://www.inf.kcl.ac.uk/research/projects/gapmis/.
AB - Pairwise sequence alignment techniques have gained renewed interest in recent years, primarily due to their applications in re-sequencing-the assembly of a genome directed by a reference sequence.In this article, we show that adding the flexibility of bounding the number of gaps inserted in an alignment strengthens the classical sequence alignment scheme of scoring matrices and affine gap penalty scores. We present GapsMis, an algorithm for pairwise global sequence alignment with a variable, but bounded, number of gaps. It is based on computing a variant of the traditional dynamic programming matrix for global sequence alignment. We also present GapsMis-L, the analogous algorithm for pairwise local sequence alignment with a variable, but bounded, number of gaps.To test the accuracy of GapsMis and GapsMis-L we performed millions of pairwise sequence alignments under realistic conditions, based on the properties of real full-length genomes. The results show that GapsMis and GapsMis-L can increase the accuracy of extending short-read alignments compared to the traditional approaches. The importance of our contribution is underlined by the fact that the provided algorithms may be seamlessly integrated into any biological pipeline. The open-source code of our implementation is freely available at http://www.inf.kcl.ac.uk/research/projects/gapmis/.
KW - Algorithms on strings
KW - Dynamic programming
KW - Pairwise sequence alignment
UR - http://www.scopus.com/inward/record.url?scp=84951852741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84951852741&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2015.03.016
DO - 10.1016/j.tcs.2015.03.016
M3 - Article
AN - SCOPUS:84951852741
SN - 0304-3975
VL - 582
SP - 1
EP - 16
JO - Theoretical Computer Science
JF - Theoretical Computer Science
ER -