[Demo] low-latency spark queries on updatable data

Alexandru Uta, Bogdan Ghit, Ankur Dave, Peter Boncz

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.

Original languageEnglish
Title of host publicationSIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages2009-2012
Number of pages4
ISBN (Electronic)9781450356435
DOIs
Publication statusPublished - 25 Jun 2019
Event2019 International Conference on Management of Data, SIGMOD 2019 - Amsterdam, Netherlands
Duration: 30 Jun 20195 Jul 2019

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2019 International Conference on Management of Data, SIGMOD 2019
CountryNetherlands
CityAmsterdam
Period30/06/195/07/19

Fingerprint

Query processing
Electric sparks

Cite this

Uta, A., Ghit, B., Dave, A., & Boncz, P. (2019). [Demo] low-latency spark queries on updatable data. In SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data (pp. 2009-2012). (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/3299869.3320227
Uta, Alexandru ; Ghit, Bogdan ; Dave, Ankur ; Boncz, Peter. / [Demo] low-latency spark queries on updatable data. SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Association for Computing Machinery, 2019. pp. 2009-2012 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{f314c51d1eac4bc99d10b9f6f8519a3e,
title = "[Demo] low-latency spark queries on updatable data",
abstract = "As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.",
author = "Alexandru Uta and Bogdan Ghit and Ankur Dave and Peter Boncz",
year = "2019",
month = "6",
day = "25",
doi = "10.1145/3299869.3320227",
language = "English",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery",
pages = "2009--2012",
booktitle = "SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data",

}

Uta, A, Ghit, B, Dave, A & Boncz, P 2019, [Demo] low-latency spark queries on updatable data. in SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, pp. 2009-2012, 2019 International Conference on Management of Data, SIGMOD 2019, Amsterdam, Netherlands, 30/06/19. https://doi.org/10.1145/3299869.3320227

[Demo] low-latency spark queries on updatable data. / Uta, Alexandru; Ghit, Bogdan; Dave, Ankur; Boncz, Peter.

SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Association for Computing Machinery, 2019. p. 2009-2012 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - [Demo] low-latency spark queries on updatable data

AU - Uta, Alexandru

AU - Ghit, Bogdan

AU - Dave, Ankur

AU - Boncz, Peter

PY - 2019/6/25

Y1 - 2019/6/25

N2 - As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.

AB - As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.

UR - http://www.scopus.com/inward/record.url?scp=85069474913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069474913&partnerID=8YFLogxK

U2 - 10.1145/3299869.3320227

DO - 10.1145/3299869.3320227

M3 - Conference contribution

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 2009

EP - 2012

BT - SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data

PB - Association for Computing Machinery

ER -

Uta A, Ghit B, Dave A, Boncz P. [Demo] low-latency spark queries on updatable data. In SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Association for Computing Machinery. 2019. p. 2009-2012. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/3299869.3320227