Efficient computation of sequence mappability

Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Juliusz Straszyński*

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Sequence mappability is an important task in genome re-sequencing. In the (k, m)-mappability problem, for a given sequence T of length n, our goal is to compute a table whose ith entry is the number of indices j≠i such that length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristic approaches to compute a rough approximation of the result or on the case of k=1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that works in O(n min {mk,log k+1 n}) time and O(n) space for k=O(1). It requires a careful adaptation of the technique of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. We also show O(n2) -time algorithms to compute all results for a fixed m and all k=0,…,m or a fixed k and all m=k,…,n-1. Finally we show that the (k, m)-mappability problem cannot be solved in strongly subquadratic time for k,m = Θ (log n) unless the Strong Exponential Time Hypothesis fails.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings
EditorsTravis Gagie, Alistair Moffat, Gonzalo Navarro, Ernesto Cuadros-Vargas
PublisherSpringer Verlag
Pages12-26
Number of pages15
ISBN (Print)9783030004781
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event25th International Symposium on String Processing and Information Retrieval, SPIRE 2018 - Lima, Peru
Duration: 9 Oct 201811 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11147 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Symposium on String Processing and Information Retrieval, SPIRE 2018
Country/TerritoryPeru
CityLima
Period9/10/1811/10/18

Keywords

  • Genome sequencing
  • Hamming distance
  • Longest common substring with k mismatches
  • Sequence mappability
  • Suffix tree

Fingerprint

Dive into the research topics of 'Efficient computation of sequence mappability'. Together they form a unique fingerprint.

Cite this