A sampling-based tool for scaling graph datasets

Ahmed Musaafir, Alexandru Uta, Henk Dreuning, Ana Lucia Varbanescu

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Graph processing has become a topic of interest in many domains. However, we still observe a lack of representative datasets for in-depth performance and scalability analysis. Neither data collections, nor graph generators provide enough diversity and control for thorough analysis. To address this problem, we proposea heuristic method for scaling existing graphs. Our approach, based onsampling andinterconnection, can provide a scaled "version" of a given graph. Moreover, we provide analytical models to predict the topological properties of the scaled graphs (such as the diameter, degree distribution, density, or the clustering coefficient), and further enable the user to tweak these properties. Property control is achieved through a portfolio of graph interconnection methods (e.g., star, ring, chain, fully connected) applied for combining the graph samples. We further implement our method as an open-source tool which can be used to quickly provide families of datasets for in-depth benchmarking of graph processing algorithms. Our empirical evaluation demonstrates our tool provides scaled graphs of a wide range of sizes, whose properties match well with model predictions and/or user requirements. Finally, we also illustrate, through a case-study, how scaled graphs can be used for in-depth performance analysis of graph processing algorithms.

Original languageEnglish
Title of host publicationICPE '20
Subtitle of host publicationProceedings of the ACM/SPEC International Conference on Performance Engineering
PublisherAssociation for Computing Machinery, Inc
Pages289-300
Number of pages12
ISBN (Electronic)9781450369916
DOIs
Publication statusPublished - 20 Apr 2020
Event11th ACM/SPEC International Conference on Performance Engineering, ICPE 2020 - Edmonton, Canada
Duration: 20 Apr 202024 Apr 2020

Conference

Conference11th ACM/SPEC International Conference on Performance Engineering, ICPE 2020
Country/TerritoryCanada
CityEdmonton
Period20/04/2024/04/20

Keywords

  • Graph datasets scaling
  • Graph sampling
  • Graph scaling tool
  • Heuristic methods

Fingerprint

Dive into the research topics of 'A sampling-based tool for scaling graph datasets'. Together they form a unique fingerprint.

Cite this