Less is not more: We need rich datasets to explore

Laurens Versluis*, Mehmet Cetin, Caspar Greeven, Kristian Laursen, Damian Podareanu, Valeriu Codreanu, Alexandru Uta, Alexandru Iosup

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Traditional datacenter analysis is based on high-level, coarse-grained metrics. This obscures our vision of datacenter behavior, as we do not observe the full picture nor subtleties that might make up these high-level, coarse metrics. There is room for operational improvement based on fine-grained temporal and spatial, low-level metric data. We leverage in this work one of the (rare) public datasets providing fine-grained information on datacenter operations, with over 60 billion measurements captured in 15-second intervals. We show evidence that fine-grained information reveals new operational aspects, that the different metrics cannot be derived from one another (and thus need to be captured), and that many low-level metrics, gathered frequently are key to understanding datacenter operations. We propose a holistic analysis for datacenter operations, providing statistical characterization of node and workload aspects. Our analysis reveals both generic and machine learning-specific aspects, summarized in over 30 observations, providing deep insight into this dataset and the originating cluster. We give actionable insights, surprising findings, and exemplify how our observations support performance-engineering tasks such as workload prediction and long-term datacenter design.

Original languageEnglish
Pages (from-to)117-130
Number of pages14
JournalFuture Generation Computer Systems
Volume142
Early online date23 Dec 2022
DOIs
Publication statusPublished - May 2023

Bibliographical note

Funding Information:
Alexandru Iosup is tenured full Professor and University Research Chair with the Vrije Universiteit Amsterdam, the Netherlands. He is also Chair of the SPEC Research Cloud Group. He received a Ph.D. from TU Delft, the Netherlands (2009) and an M.Sc. from Politehnica University of Bucharest, Romania (2004), both in computer science. He has received numerous awards and nominations. Topics include cloud computing and big data, with applications in big science, big business, online gaming, and (upcoming) massivized education. His work is funded by a combination of prestigious personal grants, generous industry gifts and collaborations, and EU and national projects.

Funding Information:
His research is funded through industry and academic grants and was recently awarded the NWO Veni (early career) award.

Funding Information:
This work was partially funded by the Dutch National Science Foundation NWO, The Netherlands through Veni grant VI.202.195 supporting Alexandru Uta.

Publisher Copyright:
© 2022 The Author(s)

Keywords

  • Datacenter
  • Dataset
  • Holistic analysis
  • Methodology
  • Open-access
  • Statistical analysis

Fingerprint

Dive into the research topics of 'Less is not more: We need rich datasets to explore'. Together they form a unique fingerprint.

Cite this