TY - GEN
T1 - Minimal absent words in a sliding window and applications to on-line pattern matching
AU - Crochemore, Maxime
AU - Héliou, Alice
AU - Kucherov, Gregory
AU - Mouchard, Laurent
AU - Pissis, Solon P.
AU - Ramusat, Yann
PY - 2017/1/1
Y1 - 2017/1/1
N2 - An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al. in Inf Process Lett 67:111–117, 1998; Belazzougui et al. in ESA 8125:133–144, 2013; Barton et al. in BMC Bioinform 15:388, 2014). Minimal absent words are used for data compression (Crochemore et al. in Proc IEEE 88:1756–1768, 2000, Ota and Morita in Theoret Comput Sci 526:108–119, 2014) and for alignment-free sequence comparison by utilizing a metric based on minimal absent words (Chairungsee and Crochemore in Theoret Comput Sci 450:109–116, 2012). They are also used in molecular biology; for instance, three minimal absent words of the human genome were found to play a functional role in a coding region in Ebola virus genomes (Silva et al. in Bioinformatics 31:2421–2425, 2015). In this article we introduce a new application of minimal absent words for on-line pattern matching. Specifically, we present an algorithm that, given a pattern x and a text y, computes the distance between x and every window of size |x| on y. The running time is O(σ|y|), where σ is the size of the alphabet. Along the way, we show an O(σ|y|)-time and O(σ|x|)-space algorithm to compute the minimal absent words of every window of size |x| on y, together with some new combinatorial insight on minimal absent words.
AB - An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al. in Inf Process Lett 67:111–117, 1998; Belazzougui et al. in ESA 8125:133–144, 2013; Barton et al. in BMC Bioinform 15:388, 2014). Minimal absent words are used for data compression (Crochemore et al. in Proc IEEE 88:1756–1768, 2000, Ota and Morita in Theoret Comput Sci 526:108–119, 2014) and for alignment-free sequence comparison by utilizing a metric based on minimal absent words (Chairungsee and Crochemore in Theoret Comput Sci 450:109–116, 2012). They are also used in molecular biology; for instance, three minimal absent words of the human genome were found to play a functional role in a coding region in Ebola virus genomes (Silva et al. in Bioinformatics 31:2421–2425, 2015). In this article we introduce a new application of minimal absent words for on-line pattern matching. Specifically, we present an algorithm that, given a pattern x and a text y, computes the distance between x and every window of size |x| on y. The running time is O(σ|y|), where σ is the size of the alphabet. Along the way, we show an O(σ|y|)-time and O(σ|x|)-space algorithm to compute the minimal absent words of every window of size |x| on y, together with some new combinatorial insight on minimal absent words.
UR - http://www.scopus.com/inward/record.url?scp=85029420355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029420355&partnerID=8YFLogxK
U2 - 10.1007/978-3-662-55751-8_14
DO - 10.1007/978-3-662-55751-8_14
M3 - Conference contribution
AN - SCOPUS:85029420355
SN - 9783662557501
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 164
EP - 176
BT - Fundamentals of Computation Theory - 21st International Symposium, FCT 2017, Proceedings
A2 - Zeitoun, Marc
A2 - Klasing, Ralf
PB - Springer Verlag
T2 - 21st International Symposium on Fundamentals of Computation Theory, FCT 2017
Y2 - 11 September 2017 through 13 September 2017
ER -