VectorH: Taking SQL-on-Hadoop to the next level

Andrei Costea, Adrian Ionescu, Bogdan Raducanu, Michał Świtakowski, Cristian Bârca, Juliusz Sompolski, Alicja Łuszczak, Michał Szafrański, Giel De Nijs, Peter Boncz

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Actian Vector in Hadoop (VectorH for short) is a new SQL-on-Hadoop system built on top of the fast Vectorwise analytical database system. VectorH achieves fault tolerance and storage scalability by relying on HDFS, and extends the state-of-the-art in SQL-on-Hadoop systems by instrumenting the HDFS replication policy to optimize read locality. VectorH integrates with YARN for workload management, achieving a high degree of elasticity. Even though HDFS is an append-only file-system, and VectorH supports (update-averse) ordered tables, trickle updates are possible thanks to Positional Delta Trees (PDTs), a diffferential update structure that can be queried efficiently. We describe the changes made to single-server Vectorwise to turn it into a Hadoop-based MPP system, encompassing workload management, parallel query optimization and execution, HDFS storage, transaction processing and Spark integration. We evaluate VectorH against HAWQ, Impala, SparkSQL and Hive, showing orders of magnitude better performance.

Original languageEnglish
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PublisherAssociation for Computing Machinery (ACM)
Pages1105-1117
Number of pages13
Volume26-June-2016
ISBN (Electronic)9781450335317
DOIs
Publication statusPublished - 26 Jun 2016
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: 26 Jun 20161 Jul 2016

Conference

Conference2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
Country/TerritoryUnited States
CitySan Francisco
Period26/06/161/07/16

Fingerprint

Dive into the research topics of 'VectorH: Taking SQL-on-Hadoop to the next level'. Together they form a unique fingerprint.

Cite this