V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows

Bogdan Ghit, Mihai Capota, Tim Hegeman, Jan Hidders, Dick H.J. Epema, Alexandru Iosup

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in excess of 1 TB with our BTWorld workflow required an in-depth analysis of the entire software stack and the design of a complete optimization cycle. We analyze our system from both theoretical and experimental perspectives and we show how we attained a 15 times larger scale of data processing than our previous results. © 2014 IEEE.
Original languageEnglish
Title of host publicationProceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014, Chicago, IL, USA, May 26-29, 2014
PublisherACM, IEEE Computer Society
Pages927-932
Number of pages6
ISBN (Print)9781479927838
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014 - Chicago, United States
Duration: 26 May 201429 May 2014

Conference

Conference14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014
Country/TerritoryUnited States
CityChicago
Period26/05/1429/05/14

Fingerprint

Dive into the research topics of 'V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows'. Together they form a unique fingerprint.

Cite this