Towards an Optimized Big Data Processing System

Bogdan Ghit, Alexandru Iosup, Dick H.J. Epema

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Scalable by design to very large computing systems such as grids and clouds, MapReduce is currently a major big data processing paradigm. Nevertheless, existing performance models for MapReduce only comply with specific workloads that process a small fraction of the entire data set, thus failing to assess the capabilities of the MapReduce paradigm under heavy workloads that process exponentially increasing data volumes. The goal of my PhD is to build and analyze a scalable and dynamic big data processing system, including storage (distributed file system), execution engine (MapReduce), and query language (Pig). My contributions for the first two years of PhD research are the following: 1) the design and implementation of a resource management system part of a MapReduce-based processing system for deploying and resizing MapReduce clusters over multicluster systems, 2) the design and implementation of a benchmarking tool for the MapReduce processing system, and 3) the evaluation and modeling of MapReduce using workloads with very large data sets. Furthermore, based on the first two years research, we will optimize the MapReduce system to efficiently process terabytes of data. © 2013 IEEE.
Original languageEnglish
Title of host publicationProceedings - 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands, May 13-16, 2013
Pages83-86
Number of pages4
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013 - Delft, Netherlands
Duration: 13 May 201316 May 2013

Conference

Conference13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013
CountryNetherlands
CityDelft
Period13/05/1316/05/13

Fingerprint

Dive into the research topics of 'Towards an Optimized Big Data Processing System'. Together they form a unique fingerprint.

Cite this