Abstract
In response to the increasing volume of research data being generated, more and more data portals have been designed to facilitate data findability and accessibility. However, a significant portion of this data remains confidential or restricted due to its sensitive nature, such as patient data or census microdata. While maintaining confidentiality prohibits its public release, the emergence of portals supporting rich metadata can help enable researchers to at least discover the existence of restricted access data, empowering them to assess the suitability of the data before requesting access. Existing standards, such as CSV on the Web and RDF Data Cube, have been adopted to facilitate data management, integration, and re-use of data on the Web. However, the current landscape still lacks adequate standards not only to effectively describe restricted access data while preserving confidentiality but also to facilitate its discovery. In this work, we investigate the relationship between the structural, statistical, and semantic elements of restricted access tabular data, and we explore how such relationship can be formally modeled in a way that is Findable, Accessible, Interoperable, and Reusable. We introduce the DataSet-Variable Ontology (DSV), that by combining CSV on the Web and RDF Data Cube standards, leveraging semantic technologies and Linked Data principles, and introducing variable-level metadata, aims to capture high-quality metadata to support the management and re-use of restricted access data on the Web. As evaluation, we conducted a case study where we applied DSV to four different datasets from different statistical governmental agencies. We employed a set of competency questions to assess the ontology's ability to support knowledge discovery and data exploration. By describing high-quality metadata, both at the dataset- and variable levels, while maintaining data privacy, this novel ontology facilitates data interoperability, discovery, and re-use and it empowers researchers to manage, integrate, and analyze complex restricted access data sources.
Original language | English |
---|---|
Title of host publication | K-CAP 2023 - Proceedings of the 12th Knowledge Capture Conference 2023 |
Publisher | Association for Computing Machinery, Inc |
Pages | 83-91 |
ISBN (Electronic) | 9798400701412 |
DOIs | |
Publication status | Published - 5 Dec 2023 |
Event | 12th ACM International Conference on Knowledge Capture, K-CAP 2023 - Pensacola, United States Duration: 5 Dec 2023 → 7 Dec 2023 |
Conference
Conference | 12th ACM International Conference on Knowledge Capture, K-CAP 2023 |
---|---|
Country/Territory | United States |
City | Pensacola |
Period | 5/12/23 → 7/12/23 |
Funding
This work is funded by the Netherlands Organisation of Scientific Research (NWO), ODISSEI Roadmap project: 184.035.014.
Funders | Funder number |
---|---|
Nederlandse Organisatie voor Wetenschappelijk Onderzoek | 184.035.014 |