Motif-Aware PRALINE: Improving the alignment of motif regions: Improving the alignment of motif regions

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.

Original languageEnglish
Article numbere1006547
Pages (from-to)1-19
Number of pages19
JournalPLoS Computational Biology
Volume14
Issue number11
DOIs
Publication statusPublished - 1 Nov 2018

Fingerprint

Sequence Alignment
sequence alignment
Alignment
Multiple Sequence Alignment
Benchmarking
Amino Acid Motifs
dynamic programming
Nucleotide Motifs
conserved sequences
Conserved Sequence
amino acid substitution
Amino Acid Substitution
Sequence Homology
sequence homology
Boosting
Use Case
Dynamic Programming
Substitution
Amino Acids
alignment

Cite this

@article{f8de778bfc764f47b365cbf7f21e287f,
title = "Motif-Aware PRALINE: Improving the alignment of motif regions: Improving the alignment of motif regions",
abstract = "Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.",
author = "Maurits Dijkstra and Punto Bawono and Sanne Abeln and Feenstra, {K. Anton} and Wan Fokkink and Jaap Heringa",
year = "2018",
month = "11",
day = "1",
doi = "10.1371/journal.pcbi.1006547",
language = "English",
volume = "14",
pages = "1--19",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "11",

}

Motif-Aware PRALINE: Improving the alignment of motif regions : Improving the alignment of motif regions. / Dijkstra, Maurits; Bawono, Punto; Abeln, Sanne; Feenstra, K. Anton; Fokkink, Wan; Heringa, Jaap.

In: PLoS Computational Biology, Vol. 14, No. 11, e1006547, 01.11.2018, p. 1-19.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Motif-Aware PRALINE: Improving the alignment of motif regions

T2 - Improving the alignment of motif regions

AU - Dijkstra, Maurits

AU - Bawono, Punto

AU - Abeln, Sanne

AU - Feenstra, K. Anton

AU - Fokkink, Wan

AU - Heringa, Jaap

PY - 2018/11/1

Y1 - 2018/11/1

N2 - Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.

AB - Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.

UR - http://www.scopus.com/inward/record.url?scp=85056514138&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056514138&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1006547

DO - 10.1371/journal.pcbi.1006547

M3 - Article

VL - 14

SP - 1

EP - 19

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 11

M1 - e1006547

ER -