Abstract
Actian Vector in Hadoop (VectorH for short) is a new SQL-on-Hadoop system built on top of the fast Vectorwise analytical database system. VectorH achieves fault tolerance and storage scalability by relying on HDFS, and extends the state-of-the-art in SQL-on-Hadoop systems by instrumenting the HDFS replication policy to optimize read locality. VectorH integrates with YARN for workload management, achieving a high degree of elasticity. Even though HDFS is an append-only file-system, and VectorH supports (update-averse) ordered tables, trickle updates are possible thanks to Positional Delta Trees (PDTs), a diffferential update structure that can be queried efficiently. We describe the changes made to single-server Vectorwise to turn it into a Hadoop-based MPP system, encompassing workload management, parallel query optimization and execution, HDFS storage, transaction processing and Spark integration. We evaluate VectorH against HAWQ, Impala, SparkSQL and Hive, showing orders of magnitude better performance.
Original language | English |
---|---|
Title of host publication | SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1105-1117 |
Number of pages | 13 |
Volume | 26-June-2016 |
ISBN (Electronic) | 9781450335317 |
DOIs | |
Publication status | Published - 26 Jun 2016 |
Event | 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States Duration: 26 Jun 2016 → 1 Jul 2016 |
Conference
Conference | 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 26/06/16 → 1/07/16 |