Scalable RDF data compression with MapReduce

J. Urbani, J. Maassen, N. Drost, F.J. Seinstra, H.E. Bal

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. Copyright © 2012 John Wiley & Sons, Ltd.
Original languageEnglish
Pages (from-to)24-39
JournalConcurrency and Computation: Practice and Experience
Volume25
Issue number1
DOIs
Publication statusPublished - 2013

Fingerprint

Data description
MapReduce
Data compression
Data Compression
Glossaries
Semantic Web
Parallel algorithms
Data structures
Resources
Compression
Data Model
Encoding
High Performance
Linearly
Prototype
Framework
Evaluate
Vertex of a graph

Cite this

Urbani, J. ; Maassen, J. ; Drost, N. ; Seinstra, F.J. ; Bal, H.E. / Scalable RDF data compression with MapReduce. In: Concurrency and Computation: Practice and Experience. 2013 ; Vol. 25, No. 1. pp. 24-39.
@article{03a6efa9861f4eb1a35303ede4a96ed6,
title = "Scalable RDF data compression with MapReduce",
abstract = "The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. Copyright {\circledC} 2012 John Wiley & Sons, Ltd.",
author = "J. Urbani and J. Maassen and N. Drost and F.J. Seinstra and H.E. Bal",
year = "2013",
doi = "10.1002/cpe.2840",
language = "English",
volume = "25",
pages = "24--39",
journal = "Concurrency and Computation: Practice and Experience",
issn = "1532-0626",
publisher = "John Wiley and Sons Ltd.",
number = "1",

}

Scalable RDF data compression with MapReduce. / Urbani, J.; Maassen, J.; Drost, N.; Seinstra, F.J.; Bal, H.E.

In: Concurrency and Computation: Practice and Experience, Vol. 25, No. 1, 2013, p. 24-39.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Scalable RDF data compression with MapReduce

AU - Urbani, J.

AU - Maassen, J.

AU - Drost, N.

AU - Seinstra, F.J.

AU - Bal, H.E.

PY - 2013

Y1 - 2013

N2 - The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. Copyright © 2012 John Wiley & Sons, Ltd.

AB - The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. Copyright © 2012 John Wiley & Sons, Ltd.

U2 - 10.1002/cpe.2840

DO - 10.1002/cpe.2840

M3 - Article

VL - 25

SP - 24

EP - 39

JO - Concurrency and Computation: Practice and Experience

JF - Concurrency and Computation: Practice and Experience

SN - 1532-0626

IS - 1

ER -