An Empirical Performance Evaluation of Distributed SQL Query Engines

Stefan van Wouw, José Viña, Alexandru Iosup, Dick H.J. Epema

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Distributed SQL Query Engines (DSQEs) are increasingly used in a variety of domains, but especially users in small companies with little expertise may face the challenge of selecting an appropriate engine for their specific applications. Although both industry and academia are attempting to come up with high level benchmarks, the performance of DSQEs has never been explored or compared in-depth. We propose an empirical method for evaluating the performance of DSQEs with representative metrics, datasets, and system condigurations. We implement a micro-benchmarking suite of three classes of SQL queries for both a synthetic and a real world dataset and we report response time, resource utilization, and scalability. We use our micro-benchmarking suite to analyze and compare three state-of-the-art engines, viz. Shark, Impala, and Hive. We gain valuable insights for each engine and we present a comprehensive comparison of these DSQEs. We find that difierent query engines have widely varying performance: Hive is always being outperformed by the other engines, but whether Impala or Shark is the best performer highly depends on the query type.
Original languageEnglish
Title of host publicationProceedings of the 6th ACM/SPEC International Conference on Performance Engineering, Austin, TX, USA, January 31 - February 4, 2015
PublisherAssociation for Computing Machinery, Inc
Pages123-131
Number of pages9
ISBN (Electronic)9781450332484
DOIs
Publication statusPublished - 28 Jan 2015
Externally publishedYes
Event6th ACM/SPEC International Conference on Performance Engineering, ICPE 2015 - Austin, United States
Duration: 31 Jan 20154 Feb 2015

Conference

Conference6th ACM/SPEC International Conference on Performance Engineering, ICPE 2015
CountryUnited States
CityAustin
Period31/01/154/02/15

Keywords

  • Distributed SQL query engine
  • Performance evaluation
  • Scalability

Fingerprint

Dive into the research topics of 'An Empirical Performance Evaluation of Distributed SQL Query Engines'. Together they form a unique fingerprint.

Cite this