Control Flow Duplication for Columnar Arrays in a Dynamic Compiler

Sebastian Kloibhofer, Lukas Makor, David Leopoldseder, Daniele Bonetta, Lukas Stadler, Hanspeter Mössenböck

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Columnar databases are an established way to speed up online analytical processing (OLAP) queries. Nowadays, data processing (e.g., storage, visualization, and analytics) is often performed at the programming language level, hence it is desirable to also adopt columnar data structures for common language runtimes. While there are frameworks, libraries, and APIs to enable columnar data stores in programming languages, their integration into applications typically requires developer interference. In prior work, researchers imple-mented an approach for automated transformation of arrays into columnar arrays in the GraalVM JavaScript runtime. However, this approach suffers from performance issues on smaller workloads as well as on more complex nested data structures. We find that the key to optimizing accesses to columnar arrays is to identify queries and apply specific optimizations to them. In this paper, we describe novel compiler optimizations in the GraalVM Compiler that optimize queries on columnar arrays. At JIT compile time, we identify loops that access potentially columnar arrays and duplicate them in order to specifically optimize accesses to columnar arrays. Additionally, we describe a new approach for creating columnar arrays from arrays consisting of complex objects by performing multi-level storage transformation. We demonstrate our approach via an implementation for JavaScript Date objects. Our work shows that automatic transformation of arrays to columnar storage is feasible even for small workloads and that more complex arrays of objects could benefit from a multi-level transformation. Furthermore, we show how we can optimize methods that handle arrays in different states by the use of duplication. We evaluated our work on microbenchmarks and established data analytics workloads (TPC-H) to demonstrate that it significantly outperforms previous efforts, with speedups of up to 10x for particular queries. Queries additionally benefit from multi-level transformation, reaching speedups of around 2x. Additionally, we show that we do not cause significant overhead on workloads not suitable for storage transformation. We argue that automatically created columnar arrays could aid developers in data-centric applications as an alternative approach to using dedicated APIs on manually created columnar arrays. Via automatic detection and optimization of queries on potentially columnar arrays, we can improve performance of data processing and further enable its use in common—particularly dynamic—programming languages.
Original languageEnglish
Article number9
JournalArt, Science, and Engineering of Programming
Volume7
Issue number3
DOIs
Publication statusPublished - 2023
Externally publishedYes

Funding

Acknowledgements Authors Sebastian Kloibhofer and Lukas Makor contributed equally to the paper. This research project was partially funded by Oracle Labs. We thank all members of the Virtual Machine Research Group at Oracle Labs. We also thank all researchers at the Johannes Kepler University Linz’s Institute for System Software for their support of and valuable feedback on our work.

FundersFunder number
Oracle Labs

    Fingerprint

    Dive into the research topics of 'Control Flow Duplication for Columnar Arrays in a Dynamic Compiler'. Together they form a unique fingerprint.

    Cite this