alexandru-uta/cloud_network_variability_data: Cloud Network Variability Data

  • Alexandru Uta (Creator)
  • Alexandru Custura (Creator)
  • Dmitry Duplyakin (Creator)
  • Ivo Jimenez (Creator)
  • Jan Rellermeyer (Creator)
  • Carlos Maltzahn (Creator)
  • Robert Ricci (Creator)
  • Alexandru Iosup (Creator)



# Dataset on Cloud Network Variability

This is the dataset obtained while benchmarking public and private clouds for the following article:

### Alexandru Uta, Alexandru Custura, Dmitry Duplyakin, Ivo Jimenez, Jan Rellermeyer, Carlos Maltzahn, Robert Ricci, Alexandru Iosup. Is Big Data Performance Reproducible in Modern Cloud Networks?. In proceedings of 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2020, February 25-27, Santa Clara, USA.

The dataset contains bandwidth variability data for:
- Amazon EC2
- Google Compute Engine
- Microsoft Azure (limited data)
- Scaleway (limited data)
- SURFsara HPCCloud

The dataset contains TCP latency (RTT) data for:
- Amazon EC2
- Google Compute Engine

The dataset contains data regarding token bucket sizes (explained in depth in the aforementioned article) for Amazon EC2.

Please note that the full archive is over 3.5 GB of data, made up of hundreds of thousands of small files. This is why we decided to archive the data, which we then split into smaller files, to accommodate for the Github max file size of 100 MB.


1. The Bandwidth Variability data:
- archived in the bandwidth_variability_data.tar.bz2.parta, .partb, .partc
- after unpacking the archive, the data is split in directories per machine type, and experiment type, for example:
                      --- perfvar-aws-m5xlarge-fullspeed: contains iperf3 output files for continuous communication between 2 m5.xlarge VMs    in Amazon EC2.
                    --- perfvar-google-4cpu-bursty-5s30s: contains iperf3 output files for bursty communication (5 seconds communication, 30 seconds break; repeat)

2. The Latency Variability data:
- archived in the file latency_study.tar.bz2
- after unpacking the archive, the directories contain TCP dump RTT data and iperf3 outputs

3. The Token Bucket AWS study:
- archived in the file token_bucket_study.tar.bz2
- contains files of form INSTANCE_TYPE-REGION-TIMESTAMP.{bw, raw, tb}
- files with extension .raw contain raw iperf3 client output
- files with extension .bw contain bandwidth samples taken at 1 second intervals from the iperf3 utility
- files with extension .tb contain a triple of form <time_to_token_bucket_depletion, high_token_bucket_bandwidth, low_token_bucket_bandwidth>
- example of a file name: c5.2xlarge-us-west-1-1567733720.raw

Date made available2019

Cite this