The workflow trace archive: Open-access data from public and private computing infrastructures

Laurens Versluis*, Roland Matha, Sacheendra Talluri, Tim Hegeman, Radu Prodan, Ewa Deelman, Alexandru Iosup

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

155 Downloads (Pure)

Abstract

Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. In this work, we focus on traces of workflows - common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes {>}48>48 million workflows captured from {>}10>10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.

Original languageEnglish
Article number9066946
Pages (from-to)2170-2184
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume31
Issue number9
Early online date14 Apr 2020
DOIs
Publication statusPublished - Sept 2020

Keywords

  • Archive
  • Characterization
  • Open-access
  • Open-source
  • Simulation
  • Survey
  • Traces
  • Workflow

Fingerprint

Dive into the research topics of 'The workflow trace archive: Open-access data from public and private computing infrastructures'. Together they form a unique fingerprint.

Cite this