Advancing data sharing and reusability for restricted access data on the Web: introducing the DataSet-Variable Ontology

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

In response to the increasing volume of research data being generated, more and more data portals have been designed to facilitate data findability and accessibility. However, a significant portion of this data remains confidential or restricted due to its sensitive nature, such as patient data or census microdata. While maintaining confidentiality prohibits its public release, the emergence of portals supporting rich metadata can help enable researchers to at least discover the existence of restricted access data, empowering them to assess the suitability of the data before requesting access. Existing standards, such as CSV on the Web and RDF Data Cube, have been adopted to facilitate data management, integration, and re-use of data on the Web. However, the current landscape still lacks adequate standards not only to effectively describe restricted access data while preserving confidentiality but also to facilitate its discovery. In this work, we investigate the relationship between the structural, statistical, and semantic elements of restricted access tabular data, and we explore how such relationship can be formally modeled in a way that is Findable, Accessible, Interoperable, and Reusable. We introduce the DataSet-Variable Ontology (DSV), that by combining CSV on the Web and RDF Data Cube standards, leveraging semantic technologies and Linked Data principles, and introducing variable-level metadata, aims to capture high-quality metadata to support the management and re-use of restricted access data on the Web. As evaluation, we conducted a case study where we applied DSV to four different datasets from different statistical governmental agencies. We employed a set of competency questions to assess the ontology's ability to support knowledge discovery and data exploration. By describing high-quality metadata, both at the dataset- and variable levels, while maintaining data privacy, this novel ontology facilitates data interoperability, discovery, and re-use and it empowers researchers to manage, integrate, and analyze complex restricted access data sources.
Original languageEnglish
Title of host publicationK-CAP 2023 - Proceedings of the 12th Knowledge Capture Conference 2023
PublisherAssociation for Computing Machinery, Inc
Pages83-91
ISBN (Electronic)9798400701412
DOIs
Publication statusPublished - 5 Dec 2023
Event12th ACM International Conference on Knowledge Capture, K-CAP 2023 - Pensacola, United States
Duration: 5 Dec 20237 Dec 2023

Conference

Conference12th ACM International Conference on Knowledge Capture, K-CAP 2023
Country/TerritoryUnited States
CityPensacola
Period5/12/237/12/23

Funding

This work is funded by the Netherlands Organisation of Scientific Research (NWO), ODISSEI Roadmap project: 184.035.014.

FundersFunder number
Nederlandse Organisatie voor Wetenschappelijk Onderzoek184.035.014

    Fingerprint

    Dive into the research topics of 'Advancing data sharing and reusability for restricted access data on the Web: introducing the DataSet-Variable Ontology'. Together they form a unique fingerprint.

    Cite this