TY - JOUR
T1 - Optimal computation of all tandem repeats in a weighted sequence
AU - Barton, Carl
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
PY - 2014/8/16
Y1 - 2014/8/16
N2 - Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment. Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O(nlogn)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(nlogn) time, thus improving on the best known On2-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.
AB - Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment. Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O(nlogn)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(nlogn) time, thus improving on the best known On2-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.
KW - IUPAC notation
KW - Tandem repeats
KW - Weighted sequences
UR - http://www.scopus.com/inward/record.url?scp=84906837284&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906837284&partnerID=8YFLogxK
U2 - 10.1186/s13015-014-0021-5
DO - 10.1186/s13015-014-0021-5
M3 - Article
AN - SCOPUS:84906837284
SN - 1748-7188
VL - 9
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 21
ER -