A Compact In-Memory Dictionary for RDF data

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

While almost all dictionary compression techniques focus on static RDF data, we present a compact in-memory RDF dictionary for dynamic and streaming data. To do so, we analysed the structure of terms in real-world datasets and observed a high degree of common prefixes. We studied the applicability of Trie data structures on RDF data to reduce the memory occupied by common prefixes and discovered that all existing Trie implementations lead to either poor performance, or an excessive memory wastage. In our approach, we address the existing limitations of Tries for RDF data, and propose a new variant of Trie which contains some optimizations explicitly designed to improve the performance on RDF data. Furthermore, we show how we use this Trie as an in-memory dictionary by using as numerical ID a memory address instead of an integer counter. This design removes the need for an additional decoding data structure, and further reduces the occupied memory. An empirical analysis on realworld datasets shows that with a reasonable overhead our technique uses 50–59% less memory than a conventional uncompressed dictionary.
Original languageEnglish
Title of host publicationThe Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings
PublisherSpringer/Verlag
Pages205-220
Number of pages16
Volume9088
ISBN (Electronic)9783319188171
DOIs
Publication statusPublished - 2015
Event12th European Semantic Web Conference (ESWC 2015) - Portoroz, Slovenia
Duration: 31 May 20154 Jun 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9088
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th European Semantic Web Conference (ESWC 2015)
CountrySlovenia
CityPortoroz
Period31/05/154/06/15

Fingerprint

Glossaries
Data storage equipment
Prefix
Data structures
Data Structures
Streaming Data
Empirical Analysis
Dictionary
Decoding
Compression
Integer
Optimization
Term

Bibliographical note

Proceedings title: Proceedings of the twelfth European Semantic Web Conference
Publisher: Springer
Place of publication: Berlin

Cite this

Bazoubandi, H. R., de Rooij, S., Urbani, J., ten Teije, A., van Harmelen, F., & Bal, H. (2015). A Compact In-Memory Dictionary for RDF data. In The Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings (Vol. 9088, pp. 205-220). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9088). Springer/Verlag. https://doi.org/10.1007/978-3-319-18818-8_13
Bazoubandi, Hamid R. ; de Rooij, Steven ; Urbani, Jacopo ; ten Teije, Annette ; van Harmelen, Frank ; Bal, Henri. / A Compact In-Memory Dictionary for RDF data. The Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings. Vol. 9088 Springer/Verlag, 2015. pp. 205-220 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{85421a7bb476400da0040eb31b196374,
title = "A Compact In-Memory Dictionary for RDF data",
abstract = "While almost all dictionary compression techniques focus on static RDF data, we present a compact in-memory RDF dictionary for dynamic and streaming data. To do so, we analysed the structure of terms in real-world datasets and observed a high degree of common prefixes. We studied the applicability of Trie data structures on RDF data to reduce the memory occupied by common prefixes and discovered that all existing Trie implementations lead to either poor performance, or an excessive memory wastage. In our approach, we address the existing limitations of Tries for RDF data, and propose a new variant of Trie which contains some optimizations explicitly designed to improve the performance on RDF data. Furthermore, we show how we use this Trie as an in-memory dictionary by using as numerical ID a memory address instead of an integer counter. This design removes the need for an additional decoding data structure, and further reduces the occupied memory. An empirical analysis on realworld datasets shows that with a reasonable overhead our technique uses 50–59{\%} less memory than a conventional uncompressed dictionary.",
author = "Bazoubandi, {Hamid R.} and {de Rooij}, Steven and Jacopo Urbani and {ten Teije}, Annette and {van Harmelen}, Frank and Henri Bal",
note = "Proceedings title: Proceedings of the twelfth European Semantic Web Conference Publisher: Springer Place of publication: Berlin",
year = "2015",
doi = "10.1007/978-3-319-18818-8_13",
language = "English",
volume = "9088",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer/Verlag",
pages = "205--220",
booktitle = "The Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings",

}

Bazoubandi, HR, de Rooij, S, Urbani, J, ten Teije, A, van Harmelen, F & Bal, H 2015, A Compact In-Memory Dictionary for RDF data. in The Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings. vol. 9088, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9088, Springer/Verlag, pp. 205-220, 12th European Semantic Web Conference (ESWC 2015), Portoroz, Slovenia, 31/05/15. https://doi.org/10.1007/978-3-319-18818-8_13

A Compact In-Memory Dictionary for RDF data. / Bazoubandi, Hamid R.; de Rooij, Steven; Urbani, Jacopo; ten Teije, Annette; van Harmelen, Frank; Bal, Henri.

The Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings. Vol. 9088 Springer/Verlag, 2015. p. 205-220 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9088).

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - A Compact In-Memory Dictionary for RDF data

AU - Bazoubandi, Hamid R.

AU - de Rooij, Steven

AU - Urbani, Jacopo

AU - ten Teije, Annette

AU - van Harmelen, Frank

AU - Bal, Henri

N1 - Proceedings title: Proceedings of the twelfth European Semantic Web Conference Publisher: Springer Place of publication: Berlin

PY - 2015

Y1 - 2015

N2 - While almost all dictionary compression techniques focus on static RDF data, we present a compact in-memory RDF dictionary for dynamic and streaming data. To do so, we analysed the structure of terms in real-world datasets and observed a high degree of common prefixes. We studied the applicability of Trie data structures on RDF data to reduce the memory occupied by common prefixes and discovered that all existing Trie implementations lead to either poor performance, or an excessive memory wastage. In our approach, we address the existing limitations of Tries for RDF data, and propose a new variant of Trie which contains some optimizations explicitly designed to improve the performance on RDF data. Furthermore, we show how we use this Trie as an in-memory dictionary by using as numerical ID a memory address instead of an integer counter. This design removes the need for an additional decoding data structure, and further reduces the occupied memory. An empirical analysis on realworld datasets shows that with a reasonable overhead our technique uses 50–59% less memory than a conventional uncompressed dictionary.

AB - While almost all dictionary compression techniques focus on static RDF data, we present a compact in-memory RDF dictionary for dynamic and streaming data. To do so, we analysed the structure of terms in real-world datasets and observed a high degree of common prefixes. We studied the applicability of Trie data structures on RDF data to reduce the memory occupied by common prefixes and discovered that all existing Trie implementations lead to either poor performance, or an excessive memory wastage. In our approach, we address the existing limitations of Tries for RDF data, and propose a new variant of Trie which contains some optimizations explicitly designed to improve the performance on RDF data. Furthermore, we show how we use this Trie as an in-memory dictionary by using as numerical ID a memory address instead of an integer counter. This design removes the need for an additional decoding data structure, and further reduces the occupied memory. An empirical analysis on realworld datasets shows that with a reasonable overhead our technique uses 50–59% less memory than a conventional uncompressed dictionary.

UR - http://www.scopus.com/inward/record.url?scp=84937485424&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937485424&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-18818-8_13

DO - 10.1007/978-3-319-18818-8_13

M3 - Conference contribution

VL - 9088

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 205

EP - 220

BT - The Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings

PB - Springer/Verlag

ER -

Bazoubandi HR, de Rooij S, Urbani J, ten Teije A, van Harmelen F, Bal H. A Compact In-Memory Dictionary for RDF data. In The Semantic Web: Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Proceedings. Vol. 9088. Springer/Verlag. 2015. p. 205-220. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-18818-8_13