Abstract
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
| Original language | English |
|---|---|
| Article number | e0309210 |
| Pages (from-to) | 1-35 |
| Number of pages | 35 |
| Journal | PLoS ONE |
| Volume | 19 |
| Issue number | 9 |
| Early online date | 10 Sept 2024 |
| DOIs | |
| Publication status | Published - Sept 2024 |
Bibliographical note
Publisher Copyright:© 2024 Leo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding
| Funders | Funder number |
|---|---|
| EuropeanCommissionHorizon2020825575 | |
| EuropeanPilotforExascale | |
| Comunidad de Madrid | |
| Sardinian Regional Government | |
| NationalBioscienceDatabaseCenter | |
| FAIR-IMPACT | |
| International Council of Shopping Centers | |
| Sator Inc. | |
| EuropeanHigh-Performance ComputingJointUndertaking | |
| National Bioscience Database Center | |
| ResearchFoundation-Flanders | |
| Japan Science and Technology Agency | |
| Universidad Politécnica de Madrid | |
| EOSC-Life | 101046203 |
| Horizon 2020 Framework Programme | 823830 |
| Generalitat de Catalunya | 955558, 2021-SGR-00412 |
| HORIZON EUROPE Framework Programme | 101057388, 101046203 |
| Spanish Government | CEX2021-001148-S, MCIN/AEI/10.13039/ 501100011033, PID2019-107255GB |
| Horizon Europe funding guarantee | 10038992 |
| DT-GEO | ELIXIRPlatformTask2022-2023 |
| EuroScienceGateway | 101057344 |
| Fonds Wetenschappelijk Onderzoek | I002819N, I000323N |
| European Commission Horizon 2020 825575 | 824087 |
| EuropeanJointProgrammeonRareDiseases | BioExcel-2 |
| European Commission | 2023 24/31 |
| European High Performance Computing Joint Undertaking | 955648, 101033975 |
| UK Research and Innovation | 10038963 |
| EU Horizon research and innovation programme | 101058129 |