TY - JOUR
T1 - iCAT+
T2 - An Interactive Customizable Anonymization Tool using Automated Translation through Deep Learning
AU - Oqaily, Momen
AU - Kabir, Mohammad Ekramul
AU - Majumdar, Suryadipta
AU - Jarraya, Yosr
AU - Zhang, Mengyuan
AU - Pourzandi, Makan
AU - Wang, Lingyu
AU - Debbabi, Mourad
PY - 2023
Y1 - 2023
N2 - Data anonymization is a viable solution for data owners to mitigate their privacy concerns. However, existing data anonymization tools are inflexible to support various privacy and utility requirements of both data owners and data users. In most cases, this limitation is due to a lack of understanding of those requirements as well as the non-customizability of the existing tools. To address this limitation, we propose iCAT+, which is an interactive and customizable anonymization approach. More specifically, we first automate the interpretation of data owners' and data users' textual requirements by deploying a Convolutional Neural Network (CNN) model for Natural Language Processing (NLP). Second, we introduce the concept of the anonymization space to model possible combinations of per-attribute anonymization primitives based on the level of privacy and utility that each primitive provides. Third, we design an ontology model that maps the translated requirements into their appropriate anonymization primitives in the defined anonymization space corresponding to the plain data. Fourth, we evaluate the efficiency and effectiveness of iCAT+ based on both real and synthetic network data. Finally, we assess its usability through a real user study involving participants from industry and research laboratories. Our experiments show the effectiveness and efficiency of our solution (e.g., requirement translation accuracy of 99% at the data owner side and 98% at the data user side, with a computational time of around one minute for the Google cluster dataset).
AB - Data anonymization is a viable solution for data owners to mitigate their privacy concerns. However, existing data anonymization tools are inflexible to support various privacy and utility requirements of both data owners and data users. In most cases, this limitation is due to a lack of understanding of those requirements as well as the non-customizability of the existing tools. To address this limitation, we propose iCAT+, which is an interactive and customizable anonymization approach. More specifically, we first automate the interpretation of data owners' and data users' textual requirements by deploying a Convolutional Neural Network (CNN) model for Natural Language Processing (NLP). Second, we introduce the concept of the anonymization space to model possible combinations of per-attribute anonymization primitives based on the level of privacy and utility that each primitive provides. Third, we design an ontology model that maps the translated requirements into their appropriate anonymization primitives in the defined anonymization space corresponding to the plain data. Fourth, we evaluate the efficiency and effectiveness of iCAT+ based on both real and synthetic network data. Finally, we assess its usability through a real user study involving participants from industry and research laboratories. Our experiments show the effectiveness and efficiency of our solution (e.g., requirement translation accuracy of 99% at the data owner side and 98% at the data user side, with a computational time of around one minute for the Google cluster dataset).
UR - http://www.scopus.com/inward/record.url?scp=85173410738&partnerID=8YFLogxK
U2 - 10.1109/TDSC.2023.3317806
DO - 10.1109/TDSC.2023.3317806
M3 - Article
SN - 1545-5971
VL - 21
SP - 1
EP - 18
JO - IEEE Transactions on Dependable and Secure Computing
JF - IEEE Transactions on Dependable and Secure Computing
IS - 4
ER -