TY - GEN
T1 - [Demo] low-latency spark queries on updatable data
AU - Uta, Alexandru
AU - Ghit, Bogdan
AU - Dave, Ankur
AU - Boncz, Peter
PY - 2019/6/25
Y1 - 2019/6/25
N2 - As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.
AB - As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.
UR - http://www.scopus.com/inward/record.url?scp=85069474913&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069474913&partnerID=8YFLogxK
U2 - 10.1145/3299869.3320227
DO - 10.1145/3299869.3320227
M3 - Conference contribution
AN - SCOPUS:85069474913
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 2009
EP - 2012
BT - SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2019 International Conference on Management of Data, SIGMOD 2019
Y2 - 30 June 2019 through 5 July 2019
ER -