Abstract
In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in excess of 1 TB with our BTWorld workflow required an in-depth analysis of the entire software stack and the design of a complete optimization cycle. We analyze our system from both theoretical and experimental perspectives and we show how we attained a 15 times larger scale of data processing than our previous results. © 2014 IEEE.
Original language | English |
---|---|
Title of host publication | Proceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014, Chicago, IL, USA, May 26-29, 2014 |
Publisher | ACM, IEEE Computer Society |
Pages | 927-932 |
Number of pages | 6 |
ISBN (Print) | 9781479927838 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014 - Chicago, United States Duration: 26 May 2014 → 29 May 2014 |
Conference
Conference | 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014 |
---|---|
Country/Territory | United States |
City | Chicago |
Period | 26/05/14 → 29/05/14 |