Opening digitized newspapers corpora: Europeana’s full-text data interoperability case

Nuno Freire, Antoine Isaac, Twan Goosen, Daan Broeder, Hugo Manguinhas, Valentine Charles

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.

Original languageEnglish
Title of host publication2nd Conference on Language, Data and Knowledge, LDK 2019
EditorsGerard de Melo, Bettina Klimek, Christian Fath, Paul Buitelaar, Milan Dojchinovski, Maria Eskevich, John P. McCrae, Christian Chiarcos
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959771054
DOIs
Publication statusPublished - 1 May 2019
Event2nd Conference on Language, Data and Knowledge, LDK 2019 - Leipzig, Germany
Duration: 20 May 201923 May 2019

Publication series

NameOpenAccess Series in Informatics
Volume70
ISSN (Print)2190-6807

Conference

Conference2nd Conference on Language, Data and Knowledge, LDK 2019
CountryGermany
CityLeipzig
Period20/05/1923/05/19

Fingerprint

Interoperability
newspaper
cultural heritage
Digital libraries
resource
Metadata
infrastructure
Cultural Heritage
Linguistics
Data structures
metadata
Resources
Retrieval
Infrastructure
resources
Digital Libraries
Data Model
history
Corpus
Text

Keywords

  • Cultural heritage
  • Data aggregation
  • Full-text
  • Interoperability
  • Metadata
  • Research infrastructures

Cite this

Freire, N., Isaac, A., Goosen, T., Broeder, D., Manguinhas, H., & Charles, V. (2019). Opening digitized newspapers corpora: Europeana’s full-text data interoperability case. In G. de Melo, B. Klimek, C. Fath, P. Buitelaar, M. Dojchinovski, M. Eskevich, J. P. McCrae, ... C. Chiarcos (Eds.), 2nd Conference on Language, Data and Knowledge, LDK 2019 [22] (OpenAccess Series in Informatics; Vol. 70). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/OASIcs.LDK.2019.22
Freire, Nuno ; Isaac, Antoine ; Goosen, Twan ; Broeder, Daan ; Manguinhas, Hugo ; Charles, Valentine. / Opening digitized newspapers corpora : Europeana’s full-text data interoperability case. 2nd Conference on Language, Data and Knowledge, LDK 2019. editor / Gerard de Melo ; Bettina Klimek ; Christian Fath ; Paul Buitelaar ; Milan Dojchinovski ; Maria Eskevich ; John P. McCrae ; Christian Chiarcos. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. (OpenAccess Series in Informatics).
@inproceedings{395cac7343f24ceca854e7c5f4dd8592,
title = "Opening digitized newspapers corpora: Europeana’s full-text data interoperability case",
abstract = "Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.",
keywords = "Cultural heritage, Data aggregation, Full-text, Interoperability, Metadata, Research infrastructures",
author = "Nuno Freire and Antoine Isaac and Twan Goosen and Daan Broeder and Hugo Manguinhas and Valentine Charles",
year = "2019",
month = "5",
day = "1",
doi = "10.4230/OASIcs.LDK.2019.22",
language = "English",
series = "OpenAccess Series in Informatics",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "{de Melo}, Gerard and Bettina Klimek and Christian Fath and Paul Buitelaar and Milan Dojchinovski and Maria Eskevich and McCrae, {John P.} and Christian Chiarcos",
booktitle = "2nd Conference on Language, Data and Knowledge, LDK 2019",

}

Freire, N, Isaac, A, Goosen, T, Broeder, D, Manguinhas, H & Charles, V 2019, Opening digitized newspapers corpora: Europeana’s full-text data interoperability case. in G de Melo, B Klimek, C Fath, P Buitelaar, M Dojchinovski, M Eskevich, JP McCrae & C Chiarcos (eds), 2nd Conference on Language, Data and Knowledge, LDK 2019., 22, OpenAccess Series in Informatics, vol. 70, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2nd Conference on Language, Data and Knowledge, LDK 2019, Leipzig, Germany, 20/05/19. https://doi.org/10.4230/OASIcs.LDK.2019.22

Opening digitized newspapers corpora : Europeana’s full-text data interoperability case. / Freire, Nuno; Isaac, Antoine; Goosen, Twan; Broeder, Daan; Manguinhas, Hugo; Charles, Valentine.

2nd Conference on Language, Data and Knowledge, LDK 2019. ed. / Gerard de Melo; Bettina Klimek; Christian Fath; Paul Buitelaar; Milan Dojchinovski; Maria Eskevich; John P. McCrae; Christian Chiarcos. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. 22 (OpenAccess Series in Informatics; Vol. 70).

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Opening digitized newspapers corpora

T2 - Europeana’s full-text data interoperability case

AU - Freire, Nuno

AU - Isaac, Antoine

AU - Goosen, Twan

AU - Broeder, Daan

AU - Manguinhas, Hugo

AU - Charles, Valentine

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.

AB - Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.

KW - Cultural heritage

KW - Data aggregation

KW - Full-text

KW - Interoperability

KW - Metadata

KW - Research infrastructures

UR - http://www.scopus.com/inward/record.url?scp=85068093480&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068093480&partnerID=8YFLogxK

U2 - 10.4230/OASIcs.LDK.2019.22

DO - 10.4230/OASIcs.LDK.2019.22

M3 - Conference contribution

T3 - OpenAccess Series in Informatics

BT - 2nd Conference on Language, Data and Knowledge, LDK 2019

A2 - de Melo, Gerard

A2 - Klimek, Bettina

A2 - Fath, Christian

A2 - Buitelaar, Paul

A2 - Dojchinovski, Milan

A2 - Eskevich, Maria

A2 - McCrae, John P.

A2 - Chiarcos, Christian

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -

Freire N, Isaac A, Goosen T, Broeder D, Manguinhas H, Charles V. Opening digitized newspapers corpora: Europeana’s full-text data interoperability case. In de Melo G, Klimek B, Fath C, Buitelaar P, Dojchinovski M, Eskevich M, McCrae JP, Chiarcos C, editors, 2nd Conference on Language, Data and Knowledge, LDK 2019. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2019. 22. (OpenAccess Series in Informatics). https://doi.org/10.4230/OASIcs.LDK.2019.22