AJIRA: A lightweight distributed middleware for map reduce and stream processing

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Currently, MapReduce is the most popular programming model for large-scale data processing and this motivated the research community to improve its efficiency either with new extensions, algorithmic optimizations, or hardware. In this paper we address two main limitations of MapReduce: one relates to the model's limited expressiveness, which prevents the implementation of complex programs that require multiple steps or iterations. The other relates to the efficiency of its most popular implementations (e.g., Hadoop), which provide good resource utilization only for massive volumes of input, operating sub optimally for smaller or rapidly changing input. To address these limitations, we present AJIRA, a new middleware designed for efficient and generic data processing. At a conceptual level, AJIRA replaces the traditional map/reduce primitives by generic operators that can be dynamically allocated, allowing the execution of more complex batch and stream processing jobs. At a more technical level, AJIRA adopts a distributed, multi-threaded architecture that strives at minimizing overhead for non-critical functionality. These characteristics allow AJIRA to be used as a single programming model for both batch and stream processing. To this end, we evaluated its performance against Hadoop, Spark, Esper, and Storm, which are state of the art systems for both batch and stream processing. Our evaluation shows that AJIRA is competitive in a wide range of scenarios both in terms of processing time and scalability, making it an ideal choice where flexibility, extensibility, and the processing of both large and dynamic data with a single programming model are either desirable or even mandatory requirements.

Original languageEnglish
Title of host publicationProceedings - International Conference on Distributed Computing Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages545-554
Number of pages10
ISBN (Electronic)9781479951680
DOIs
Publication statusPublished - 29 Aug 2014
Event2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014 - Madrid, Spain
Duration: 30 Jun 20143 Jul 2014

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Conference

Conference2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014
CountrySpain
CityMadrid
Period30/06/143/07/14

Fingerprint

Middleware
Processing
Electric sparks
Scalability
Hardware

Cite this

Urbani, J., Margara, A., Jacobs, C., Voulgaris, S., & Bal, H. (2014). AJIRA: A lightweight distributed middleware for map reduce and stream processing. In Proceedings - International Conference on Distributed Computing Systems (pp. 545-554). [6888930] (Proceedings - International Conference on Distributed Computing Systems). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDCS.2014.62
Urbani, Jacopo ; Margara, Alessandro ; Jacobs, Ceriel ; Voulgaris, Spyros ; Bal, Herni. / AJIRA : A lightweight distributed middleware for map reduce and stream processing. Proceedings - International Conference on Distributed Computing Systems. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 545-554 (Proceedings - International Conference on Distributed Computing Systems).
@inproceedings{5d84a8b238164e68b2216749d88d02b8,
title = "AJIRA: A lightweight distributed middleware for map reduce and stream processing",
abstract = "Currently, MapReduce is the most popular programming model for large-scale data processing and this motivated the research community to improve its efficiency either with new extensions, algorithmic optimizations, or hardware. In this paper we address two main limitations of MapReduce: one relates to the model's limited expressiveness, which prevents the implementation of complex programs that require multiple steps or iterations. The other relates to the efficiency of its most popular implementations (e.g., Hadoop), which provide good resource utilization only for massive volumes of input, operating sub optimally for smaller or rapidly changing input. To address these limitations, we present AJIRA, a new middleware designed for efficient and generic data processing. At a conceptual level, AJIRA replaces the traditional map/reduce primitives by generic operators that can be dynamically allocated, allowing the execution of more complex batch and stream processing jobs. At a more technical level, AJIRA adopts a distributed, multi-threaded architecture that strives at minimizing overhead for non-critical functionality. These characteristics allow AJIRA to be used as a single programming model for both batch and stream processing. To this end, we evaluated its performance against Hadoop, Spark, Esper, and Storm, which are state of the art systems for both batch and stream processing. Our evaluation shows that AJIRA is competitive in a wide range of scenarios both in terms of processing time and scalability, making it an ideal choice where flexibility, extensibility, and the processing of both large and dynamic data with a single programming model are either desirable or even mandatory requirements.",
author = "Jacopo Urbani and Alessandro Margara and Ceriel Jacobs and Spyros Voulgaris and Herni Bal",
year = "2014",
month = "8",
day = "29",
doi = "10.1109/ICDCS.2014.62",
language = "English",
series = "Proceedings - International Conference on Distributed Computing Systems",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "545--554",
booktitle = "Proceedings - International Conference on Distributed Computing Systems",
address = "United States",

}

Urbani, J, Margara, A, Jacobs, C, Voulgaris, S & Bal, H 2014, AJIRA: A lightweight distributed middleware for map reduce and stream processing. in Proceedings - International Conference on Distributed Computing Systems., 6888930, Proceedings - International Conference on Distributed Computing Systems, Institute of Electrical and Electronics Engineers Inc., pp. 545-554, 2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014, Madrid, Spain, 30/06/14. https://doi.org/10.1109/ICDCS.2014.62

AJIRA : A lightweight distributed middleware for map reduce and stream processing. / Urbani, Jacopo; Margara, Alessandro; Jacobs, Ceriel; Voulgaris, Spyros; Bal, Herni.

Proceedings - International Conference on Distributed Computing Systems. Institute of Electrical and Electronics Engineers Inc., 2014. p. 545-554 6888930 (Proceedings - International Conference on Distributed Computing Systems).

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - AJIRA

T2 - A lightweight distributed middleware for map reduce and stream processing

AU - Urbani, Jacopo

AU - Margara, Alessandro

AU - Jacobs, Ceriel

AU - Voulgaris, Spyros

AU - Bal, Herni

PY - 2014/8/29

Y1 - 2014/8/29

N2 - Currently, MapReduce is the most popular programming model for large-scale data processing and this motivated the research community to improve its efficiency either with new extensions, algorithmic optimizations, or hardware. In this paper we address two main limitations of MapReduce: one relates to the model's limited expressiveness, which prevents the implementation of complex programs that require multiple steps or iterations. The other relates to the efficiency of its most popular implementations (e.g., Hadoop), which provide good resource utilization only for massive volumes of input, operating sub optimally for smaller or rapidly changing input. To address these limitations, we present AJIRA, a new middleware designed for efficient and generic data processing. At a conceptual level, AJIRA replaces the traditional map/reduce primitives by generic operators that can be dynamically allocated, allowing the execution of more complex batch and stream processing jobs. At a more technical level, AJIRA adopts a distributed, multi-threaded architecture that strives at minimizing overhead for non-critical functionality. These characteristics allow AJIRA to be used as a single programming model for both batch and stream processing. To this end, we evaluated its performance against Hadoop, Spark, Esper, and Storm, which are state of the art systems for both batch and stream processing. Our evaluation shows that AJIRA is competitive in a wide range of scenarios both in terms of processing time and scalability, making it an ideal choice where flexibility, extensibility, and the processing of both large and dynamic data with a single programming model are either desirable or even mandatory requirements.

AB - Currently, MapReduce is the most popular programming model for large-scale data processing and this motivated the research community to improve its efficiency either with new extensions, algorithmic optimizations, or hardware. In this paper we address two main limitations of MapReduce: one relates to the model's limited expressiveness, which prevents the implementation of complex programs that require multiple steps or iterations. The other relates to the efficiency of its most popular implementations (e.g., Hadoop), which provide good resource utilization only for massive volumes of input, operating sub optimally for smaller or rapidly changing input. To address these limitations, we present AJIRA, a new middleware designed for efficient and generic data processing. At a conceptual level, AJIRA replaces the traditional map/reduce primitives by generic operators that can be dynamically allocated, allowing the execution of more complex batch and stream processing jobs. At a more technical level, AJIRA adopts a distributed, multi-threaded architecture that strives at minimizing overhead for non-critical functionality. These characteristics allow AJIRA to be used as a single programming model for both batch and stream processing. To this end, we evaluated its performance against Hadoop, Spark, Esper, and Storm, which are state of the art systems for both batch and stream processing. Our evaluation shows that AJIRA is competitive in a wide range of scenarios both in terms of processing time and scalability, making it an ideal choice where flexibility, extensibility, and the processing of both large and dynamic data with a single programming model are either desirable or even mandatory requirements.

UR - http://www.scopus.com/inward/record.url?scp=84907764982&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907764982&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2014.62

DO - 10.1109/ICDCS.2014.62

M3 - Conference contribution

T3 - Proceedings - International Conference on Distributed Computing Systems

SP - 545

EP - 554

BT - Proceedings - International Conference on Distributed Computing Systems

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Urbani J, Margara A, Jacobs C, Voulgaris S, Bal H. AJIRA: A lightweight distributed middleware for map reduce and stream processing. In Proceedings - International Conference on Distributed Computing Systems. Institute of Electrical and Electronics Engineers Inc. 2014. p. 545-554. 6888930. (Proceedings - International Conference on Distributed Computing Systems). https://doi.org/10.1109/ICDCS.2014.62