Skip to main navigation Skip to search Skip to main content

Two architectures for parallel processing of huge amounts of text

  • Mathijs Kattenberg
  • , Zuhaitz Beloki
  • , Aitor Soroa
  • , Xabier Artola
  • , Antske Fokkens
  • , Paul Huygen
  • , Kees Verstoep

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

This paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallel processing. The two architectures focus on different processing scenarios, namely batch-processing and streaming processing. The batch-processing scenario aims at optimizing the overall throughput of the system, i.e., minimizing the overall time spent on processing all documents. The streaming architecture aims to minimize the time to process real-time incoming documents and is therefore especially suitable for live feeds. The paper presents experiments with both architectures, and reports the overall gain when they are used for batch as well as for streaming processing. All the software described in the paper is publicly available under free licenses.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages4513-4519
Number of pages7
ISBN (Electronic)9782951740891
Publication statusPublished - 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 23 May 201628 May 2016

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Conference

Conference10th International Conference on Language Resources and Evaluation, LREC 2016
Country/TerritorySlovenia
CityPortoroz
Period23/05/1628/05/16

Keywords

  • Big data
  • Parallel processing
  • System architecture

Fingerprint

Dive into the research topics of 'Two architectures for parallel processing of huge amounts of text'. Together they form a unique fingerprint.

Cite this