[Demo] low-latency spark queries on updatable data

Alexandru Uta, Bogdan Ghit, Ankur Dave, Peter Boncz

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

21 Downloads (Pure)

Abstract

As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.

Original languageEnglish
Title of host publicationSIGMOD '19
Subtitle of host publicationProceedings of the 2019 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages2009-2012
Number of pages4
ISBN (Electronic)9781450356435
DOIs
Publication statusPublished - Jun 2019
Event2019 International Conference on Management of Data, SIGMOD 2019 - Amsterdam, Netherlands
Duration: 30 Jun 20195 Jul 2019

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2019 International Conference on Management of Data, SIGMOD 2019
Country/TerritoryNetherlands
CityAmsterdam
Period30/06/195/07/19

Fingerprint

Dive into the research topics of '[Demo] low-latency spark queries on updatable data'. Together they form a unique fingerprint.

Cite this