Dictionary matching in elastic-degenerate texts with applications in searching VCF files on-line

Solon P. Pissis, Ahmad Retha

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. For a standard pattern of length m, pattern matching in an elastic-degenerate text can be solved on-line in time O(nm2 + N) with pre-processing time and space O(m) (Grossi et al., CPM 2017). A fast bit-vector algorithm requiring time O(N · mw ) with pre-processing time and space O(m·mw ), where w is the size of the computer word, was also presented. In this paper we consider the same problem for a set of patterns of total length M. A straightforward generalization of the existing bit-vector algorithm would require time O(N · Mw ) with pre-processing time and space O(M · Mw ), which is prohibitive in practice. We present a new on-line O(N · Mw )-time algorithm with pre-processing time and space O(M). We present experimental results using both synthetic and real data demonstrating the performance of the algorithm. We further demonstrate a real application of our algorithm in a pipeline for discovery and verification of minimal absent words (MAWs) in the human genome showing that a significant number of previously discovered MAWs are in fact false-positives when a population’s variants are considered.

Original languageEnglish
Title of host publication17th Symposium on Experimental Algorithms, SEA 2018
EditorsGianlorenzo D'Angelo
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages16:1-16:14
ISBN (Electronic)9783959770705
DOIs
Publication statusPublished - 1 Jun 2018
Externally publishedYes
Event17th Symposium on Experimental Algorithms, SEA 2018 - L'Aquila, Italy
Duration: 27 Jun 201829 Jun 2018

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume103
ISSN (Print)1868-8969

Conference

Conference17th Symposium on Experimental Algorithms, SEA 2018
Country/TerritoryItaly
CityL'Aquila
Period27/06/1829/06/18

Keywords

  • Algorithms on strings
  • Dictionary matching
  • Elastic-degenerate string
  • Phrases on-line algorithms
  • Variant Call Format

Fingerprint

Dive into the research topics of 'Dictionary matching in elastic-degenerate texts with applications in searching VCF files on-line'. Together they form a unique fingerprint.

Cite this