The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF

Alie Lassche, Roser Morante

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents the annotations and experiments aimed at exploring whether we can automatically extract media related information (source, perception, and receiver) from a corpus of early modern Dutch chronicles in order to get insight in the mediascape of early modern middle class people from a historic perspective. In a number of classification experiments with Conditional Random Fields, three categories of features are tested: (i) raw and binary word embedding features, (ii) lexicon features, and (iii) character features. Overall, the classifier that uses raw embeddings performs slightly better. However, given that the best F-scores are around 0.60, we conclude that the machine learning approach needs to be combined with a close reading approach for the results to be useful to answer history research questions.

Original languageEnglish
Title of host publicationProceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCHCLfL 2021 - Co-located with the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
EditorsStefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Nils Reiter, Stan Szpakowicz
PublisherAssociation for Computational Linguistics (ACL)
Pages1-10
Number of pages10
ISBN (Electronic)9781954085916
DOIs
Publication statusPublished - Nov 2021
Event5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCHCLfL 2021 - Virtual, Punta Cana, Dominican Republic
Duration: 11 Nov 2021 → …

Conference

Conference5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCHCLfL 2021
Country/TerritoryDominican Republic
CityVirtual, Punta Cana
Period11/11/21 → …

Bibliographical note

Funding Information:
Research for this paper was conducted at Leiden University and the Vrije Universiteit Amsterdam, and funded through the NWO VC project ‘Chronicling Novelty. New knowledge in the Netherlands, 1500-1850’. We are thankful to the anonymous reviewers for their insightful comments, and to the annotators.

Publisher Copyright:
© LaTeCHCLfL,EMNLP 2021 - Proceedings.All rights reserved.

Funding

Research for this paper was conducted at Leiden University and the Vrije Universiteit Amsterdam, and funded through the NWO VC project ‘Chronicling Novelty. New knowledge in the Netherlands, 1500-1850’. We are thankful to the anonymous reviewers for their insightful comments, and to the annotators.

FundersFunder number
Nederlandse Organisatie voor Wetenschappelijk Onderzoek1500-1850
Nederlandse Organisatie voor Wetenschappelijk Onderzoek

    Fingerprint

    Dive into the research topics of 'The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF'. Together they form a unique fingerprint.

    Cite this