Abstract
We initiate a study on the fundamental relation between data sanitization (i.e., the process of hiding confidential information in a given dataset) and frequent pattern mining, in the context of sequential (string) data. Current methods for string sanitization hide confidential patterns introducing, however, a number of spurious patterns that may harm the utility of frequent pattern mining. The main computational problem is to minimize this harm. Our contribution here is twofold. First, we present several hardness results, for different variants of this problem, essentially showing that these variants cannot be solved or even be approximated in polynomial time. Second, we propose integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under certain realistic assumptions on the problem parameters.
| Original language | English |
|---|---|
| Title of host publication | 2020 [20th] IEEE International Conference on Data Mining (ICDM) |
| Subtitle of host publication | [Proceedings] |
| Editors | Claudia Plant, Haixun Wang, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 924-929 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781728183169 |
| DOIs | |
| Publication status | Published - 9 Feb 2021 |
| Event | 20th IEEE International Conference on Data Mining, ICDM 2020 - Virtual, Sorrento, Italy Duration: 17 Nov 2020 → 20 Nov 2020 |
Publication series
| Name | Proceedings - IEEE International Conference on Data Mining, ICDM |
|---|---|
| Volume | 2020-November |
| ISSN (Print) | 1550-4786 |
Conference
| Conference | 20th IEEE International Conference on Data Mining, ICDM 2020 |
|---|---|
| Country/Territory | Italy |
| City | Virtual, Sorrento |
| Period | 17/11/20 → 20/11/20 |
Bibliographical note
Funding Information:Acknowledgments. MIUR Grant 20174LF3T8 AHeAD; University of Pisa ”PRA – Progetti di Ricerca di Ateneo” (Institutional Research Grants) Grant PRA 20202021 26 “Metodi Informatici Integrati per la Biomedica”; and NWO Gravitation-grant NETWORKS-024.002.003.
Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
Funding
Acknowledgments. MIUR Grant 20174LF3T8 AHeAD; University of Pisa ”PRA – Progetti di Ricerca di Ateneo” (Institutional Research Grants) Grant PRA 20202021 26 “Metodi Informatici Integrati per la Biomedica”; and NWO Gravitation-grant NETWORKS-024.002.003.
| Funders | Funder number |
|---|---|
| Nederlandse Organisatie voor Wetenschappelijk Onderzoek | 024.002.003, NETWORKS-024.002.003 |
| Università di Pisa | PRA 20202021 26 |
| Ministero dell’Istruzione, dell’Università e della Ricerca | 20174LF3T8 AHeAD |
Keywords
- Data privacy
- Data sanitization
- Frequent pattern mining
- Knowledge hiding
- String algorithms
Fingerprint
Dive into the research topics of 'Hide and mine in strings: Hardness and algorithms'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver