Scalable by design to very large computing systems such as grids and clouds, MapReduce is currently a major big data processing paradigm. Nevertheless, existing performance models for MapReduce only comply with specific workloads that process a small fraction of the entire data set, thus failing to assess the capabilities of the MapReduce paradigm under heavy workloads that process exponentially increasing data volumes. The goal of my PhD is to build and analyze a scalable and dynamic big data processing system, including storage (distributed file system), execution engine (MapReduce), and query language (Pig). My contributions for the first two years of PhD research are the following: 1) the design and implementation of a resource management system part of a MapReduce-based processing system for deploying and resizing MapReduce clusters over multicluster systems, 2) the design and implementation of a benchmarking tool for the MapReduce processing system, and 3) the evaluation and modeling of MapReduce using workloads with very large data sets. Furthermore, based on the first two years research, we will optimize the MapReduce system to efficiently process terabytes of data. © 2013 IEEE.
|Title of host publication||Proceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands, May 13-16, 2013|
|Number of pages||4|
|Publication status||Published - 2013|
|Event||13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013 - Delft, Netherlands|
Duration: 13 May 2013 → 16 May 2013
|Conference||13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013|
|Period||13/05/13 → 16/05/13|