Constructing strings avoiding forbidden substrings

Giulia Bernardini*, Alberto Marchetti-Spaccamela, Solon P. Pissis, Leen Stougie, Michelle Sweering

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

We consider the problem of constructing strings over an alphabet Σ that start with a given prefix u, end with a given suffix v, and avoid occurrences of a given set of forbidden substrings. In the decision version of the problem, given a set Sk of forbidden substrings, each of length k, over Σ, we are asked to decide whether there exists a string x over Σ such that u is a prefix of x, v is a suffix of x, and no s ϵ Sk occurs in x. Our first result is an O(|u| + |v| + k|Sk|)-time algorithm to decide this problem. In the more general optimization version of the problem, given a set S of forbidden arbitrary-length substrings over Σ, we are asked to construct a shortest string x over S such that u is a prefix of x, v is a suffix of x, and no s ϵ S occurs in x. Our second result is an O(|u| + |v| + ||S|| · |Σ|)-time algorithm to solve this problem, where ||S|| denotes the total length of the elements of S. Interestingly, our results can be directly applied to solve the reachability and shortest path problems in complete de Bruijn graphs in the presence of forbidden edges or of forbidden paths. Our algorithms are motivated by data privacy, and in particular, by the data sanitization process. In the context of strings, sanitization consists in hiding forbidden substrings from a given string by introducing the least amount of spurious information. We consider the following problem. Given a string w of length n over Σ, an integer k, and a set Sk of forbidden substrings, each of length k, over Σ, construct a shortest string y over Σ such that no s ϵ Sk occurs in y and the sequence of all other length-k fragments occurring in w is a subsequence of the sequence of the length-k fragments occurring in y. Our third result is an O(nk|Sk| · |Σ|)-time algorithm to solve this problem.

Original languageEnglish
Title of host publication32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)
EditorsPawel Gawrychowski, Tatiana Starikovskaya
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages1-18
Number of pages18
ISBN (Print)9783959771863
DOIs
Publication statusPublished - 2021
Event32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021 - Wroclaw, Poland
Duration: 5 Jul 20217 Jul 2021

Publication series

NameLeibniz International Proceedings in Informatics (LIPIcs)
Volume191
ISSN (Print)1868-8969

Conference

Conference32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021
Country/TerritoryPoland
CityWroclaw
Period5/07/217/07/21

Bibliographical note

Funding Information:
Funding Giulia Bernardini: Netherlands Organisation for Scientific Research (NWO) under project OCENW.GROOT.2019.015 “Optimization for and with Machine Learning (OPTIMAL)” Leen Stougie: Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003 Michelle Sweering: Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003

Publisher Copyright:
© Giulia Bernardini, Alberto Marchetti-Spaccamela, Solon P. Pissis, Leen Stougie, and Michelle Sweering.

Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.

Keywords

  • Data sanitization
  • De Bruijn graphs
  • Forbidden strings
  • String algorithms

Fingerprint

Dive into the research topics of 'Constructing strings avoiding forbidden substrings'. Together they form a unique fingerprint.

Cite this