Querying highly similar sequences

Carl Barton*, Mathieu Giraud, Costas S. Iliopoulos, Thierry Lecroq, Laurent Mouchard, Solon P. Pissis

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S0, S1, Sk, of sequences of equal length, where Si, for all 1=i=k, differs from S0 by a constant number of errors-around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k-the number of sequences.

Original languageEnglish
Pages (from-to)119-130
Number of pages12
JournalInternational Journal of Computational Biology and Drug Design
Volume6
Issue number1-2
Publication statusPublished - 1 Jan 2013
Externally publishedYes

Keywords

  • DNA sequencing
  • Highly similar sequences
  • Next-generation sequencing
  • NGS
  • Querying DNA sequences
  • Similarity searching

Fingerprint

Dive into the research topics of 'Querying highly similar sequences'. Together they form a unique fingerprint.

Cite this