A sampling-based tool for scaling graph datasets

Ahmed Musaafir, Alexandru Uta, Henk Dreuning, Ana Lucia Varbanescu

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review


Graph processing has become a topic of interest in many domains. However, we still observe a lack of representative datasets for in-depth performance and scalability analysis. Neither data collections, nor graph generators provide enough diversity and control for thorough analysis. To address this problem, we proposea heuristic method for scaling existing graphs. Our approach, based onsampling andinterconnection, can provide a scaled "version" of a given graph. Moreover, we provide analytical models to predict the topological properties of the scaled graphs (such as the diameter, degree distribution, density, or the clustering coefficient), and further enable the user to tweak these properties. Property control is achieved through a portfolio of graph interconnection methods (e.g., star, ring, chain, fully connected) applied for combining the graph samples. We further implement our method as an open-source tool which can be used to quickly provide families of datasets for in-depth benchmarking of graph processing algorithms. Our empirical evaluation demonstrates our tool provides scaled graphs of a wide range of sizes, whose properties match well with model predictions and/or user requirements. Finally, we also illustrate, through a case-study, how scaled graphs can be used for in-depth performance analysis of graph processing algorithms.

Original languageEnglish
Title of host publicationICPE '20
Subtitle of host publicationProceedings of the ACM/SPEC International Conference on Performance Engineering
PublisherAssociation for Computing Machinery, Inc
Number of pages12
ISBN (Electronic)9781450369916
Publication statusPublished - 20 Apr 2020
Event11th ACM/SPEC International Conference on Performance Engineering, ICPE 2020 - Edmonton, Canada
Duration: 20 Apr 202024 Apr 2020


Conference11th ACM/SPEC International Conference on Performance Engineering, ICPE 2020


  • Graph datasets scaling
  • Graph sampling
  • Graph scaling tool
  • Heuristic methods


Dive into the research topics of 'A sampling-based tool for scaling graph datasets'. Together they form a unique fingerprint.

Cite this