Abstract
Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on code generation to optimize the execution performance of SQL queries on a variety of data sources. Despite its already efficient runtime, Spark's code generation suffers from significant runtime overheads related to data de-serialization during query execution. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.
Original language | English |
---|---|
Title of host publication | Programming 2020 - Conference Companion of the 4th International Conference on Art, Science, and Engineering of Programming |
Editors | A. Aguiar, S. Chiba, E.G. Boix |
Publisher | Association for Computing Machinery |
Pages | 46-49 |
ISBN (Electronic) | 9781450375078 |
DOIs | |
Publication status | Published - 23 Mar 2020 |
Externally published | Yes |
Event | 4th International Conference on Art, Science, and Engineering of Programming, Programming 2020 - Virtual, Online, Portugal Duration: 23 Mar 2020 → 26 Mar 2020 |
Conference
Conference | 4th International Conference on Art, Science, and Engineering of Programming, Programming 2020 |
---|---|
Country/Territory | Portugal |
City | Virtual, Online |
Period | 23/03/20 → 26/03/20 |
Funding
The work presented in this paper has been supported by Oracle (ERO project 1332) and the Swiss National Science Foundation (project 200020_188688). We thank the VM Research Group at Oracle Labs for their support. Oracle, Java, and HotSpot are trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Funders | Funder number |
---|---|
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung | 200020_188688 |