TY - JOUR
T1 - Motif-Aware PRALINE: Improving the alignment of motif regions
T2 - Improving the alignment of motif regions
AU - Dijkstra, Maurits
AU - Bawono, Punto
AU - Abeln, Sanne
AU - Feenstra, K. Anton
AU - Fokkink, Wan
AU - Heringa, Jaap
PY - 2018/11/1
Y1 - 2018/11/1
N2 - Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.
AB - Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.
UR - http://www.scopus.com/inward/record.url?scp=85056514138&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056514138&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1006547
DO - 10.1371/journal.pcbi.1006547
M3 - Article
C2 - 30383764
SN - 1553-734X
VL - 14
SP - 1
EP - 19
JO - PLoS Computational Biology
JF - PLoS Computational Biology
IS - 11
M1 - e1006547
ER -