Abstract
Scientific domains such as astronomy or bioinformatics produce increasingly large amounts of data that need to be analyzed. Such analyses are modeled as scientific workflows — applications composed of many individual tasks that exhibit data dependencies. Typically, these applications suffer from significant variability in the interplay between achieved parallelism and data footprint. To efficiently tackle the data deluge, cost effective solutions need to be deployed by extending private computing infrastructures with public cloud resources. To achieve this, two key features for such systems need to be addressed: elasticity and network adaptability. The former improves compute resource utilization efficiency, while the latter improves network utilization efficiency, since public clouds suffer from significant bandwidth variability. This paper extends our previous work on MemEFS, an in-memory elastic distributed file system by adding network adaptability. Our results show that MemEFS’ elasticity increases the resource utilization efficiency by up to 65%. Regarding the network adaptation policy, MemEFS achieves up to 50% speedup compared to its network-agnostic counterpart.
Original language | English |
---|---|
Pages (from-to) | 631-646 |
Number of pages | 16 |
Journal | Future Generation Computer Systems |
Volume | 82 |
Early online date | 16 Mar 2017 |
DOIs | |
Publication status | Published - May 2018 |
Keywords
- Big data and HPC systems
- Big data for e-Science
- Distributed hashing
- Elasticity
- High-performance I/O
- In-memory file system
- Large-scale scientific computing
- Large-scale systems for computational sciences
- Network adaptation
- Network variability
- Scalable computing