Recording provenance of workflow runs with RO-Crate

Simone Leo*, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo, Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske de Wit, Bruno P. Kinoshita, Stian Soiland-Reyes

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.

Original languageEnglish
Article numbere0309210
Pages (from-to)1-35
Number of pages35
JournalPLoS ONE
Volume19
Issue number9
Early online date10 Sept 2024
DOIs
Publication statusPublished - Sept 2024

Bibliographical note

Publisher Copyright:
© 2024 Leo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding

FundersFunder number
EuropeanCommissionHorizon2020825575
EuropeanPilotforExascale
Comunidad de Madrid
Sardinian Regional Government
NationalBioscienceDatabaseCenter
FAIR-IMPACT
International Council of Shopping Centers
Sator Inc.
EuropeanHigh-Performance ComputingJointUndertaking
National Bioscience Database Center
ResearchFoundation-Flanders
Japan Science and Technology Agency
Universidad Politécnica de Madrid
EOSC-Life101046203
Horizon 2020 Framework Programme823830
Generalitat de Catalunya955558, 2021-SGR-00412
HORIZON EUROPE Framework Programme101057388, 101046203
Spanish GovernmentCEX2021-001148-S, MCIN/AEI/10.13039/ 501100011033, PID2019-107255GB
Horizon Europe funding guarantee10038992
DT-GEOELIXIRPlatformTask2022-2023
EuroScienceGateway101057344
Fonds Wetenschappelijk OnderzoekI002819N, I000323N
European Commission Horizon 2020 825575824087
EuropeanJointProgrammeonRareDiseasesBioExcel-2
European Commission2023 24/31
European High Performance Computing Joint Undertaking955648, 101033975
UK Research and Innovation10038963
EU Horizon research and innovation programme101058129

    Fingerprint

    Dive into the research topics of 'Recording provenance of workflow runs with RO-Crate'. Together they form a unique fingerprint.

    Cite this