TY - JOUR
T1 - A faster and more accurate heuristic for cyclic edit distance computation
AU - Ayad, Lorraine A.K.
AU - Barton, Carl
AU - Pissis, Solon P.
PY - 2017/3/1
Y1 - 2017/3/1
N2 - Sequence comparison is the core computation of many applications involving textual representations of data. Edit distance is the most widely used measure to quantify the similarity of two sequences. Edit distance can be defined as the minimal total cost of a sequence of edit operations to transform one sequence into the other; for a sequence x of length m and a sequence y of length n, it can be computed in time O(mn). In many applications, it is common to consider sequences with circular structure: for instance, the orientation of two images or the leftmost position of two linearised circular DNA sequences may be irrelevant. To this end, an algorithm to compute the cyclic edit distance in time O(mnlogm) was proposed (Maes, 2003 [18]) and several heuristics have been proposed to speed up this computation. Recently, a new algorithm based on q-grams was proposed for circular sequence comparison (Grossi et al., 2016 [13]). We extend this algorithm for cyclic edit distance computation and show that this new heuristic is faster and more accurate than the state of the art. The aim of this letter is to give visibility to this idea in the pattern recognition community.
AB - Sequence comparison is the core computation of many applications involving textual representations of data. Edit distance is the most widely used measure to quantify the similarity of two sequences. Edit distance can be defined as the minimal total cost of a sequence of edit operations to transform one sequence into the other; for a sequence x of length m and a sequence y of length n, it can be computed in time O(mn). In many applications, it is common to consider sequences with circular structure: for instance, the orientation of two images or the leftmost position of two linearised circular DNA sequences may be irrelevant. To this end, an algorithm to compute the cyclic edit distance in time O(mnlogm) was proposed (Maes, 2003 [18]) and several heuristics have been proposed to speed up this computation. Recently, a new algorithm based on q-grams was proposed for circular sequence comparison (Grossi et al., 2016 [13]). We extend this algorithm for cyclic edit distance computation and show that this new heuristic is faster and more accurate than the state of the art. The aim of this letter is to give visibility to this idea in the pattern recognition community.
KW - Chain code
KW - Circular sequences
KW - Cyclic edit distance
KW - Cyclic strings
KW - q-gram distance
UR - http://www.scopus.com/inward/record.url?scp=85012241623&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012241623&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2017.01.018
DO - 10.1016/j.patrec.2017.01.018
M3 - Article
AN - SCOPUS:85012241623
VL - 88
SP - 81
EP - 87
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
SN - 0167-8655
ER -