Machine learning to geographically enrich understudied sources: A conceptual approach

Lorella Viola, Jaap Verheul

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

This paper discusses the added value of applying machine learning (ML) to contextually enrich digital collections. In this study, we employed ML as a method to geographically enrich historical datasets. Specifically, we used a sequence tagging tool (Riedl and Padó 2018) which implements TensorFlow to perform NER on a corpus of historical immigrant newspapers. Afterwards, the entities were extracted and geocoded. The aim was to prepare large quantities of unstructured data for a conceptual historical analysis of geographical references. The intention was to develop a method that would assist researchers working in spatial humanities, a recently emerged interdisciplinary field focused on geographic and conceptual space. Here we describe the ML methodology and the geocoding phase of the project, focussing on the advantages and challenges of this approach, particularly for humanities scholars. We also argue that, by choosing to use largely neglected sources such as immigrant newspapers (also known as ethnic newspapers), this study contributes to the debate about diversity representation and archival biases in digital practices.
Original languageEnglish
Title of host publicationICAART 2020 - Proceedings of the 12th International Conference on Agents and Artificial Intelligence
EditorsA. Rocha, L. Steels, J. van den Herik
PublisherSciTePress
Pages469-475
ISBN (Electronic)9789897583957
Publication statusPublished - 2020
Externally publishedYes
Event12th International Conference on Agents and Artificial Intelligence, ICAART 2020 - Valletta, Malta
Duration: 22 Feb 202024 Feb 2020

Conference

Conference12th International Conference on Agents and Artificial Intelligence, ICAART 2020
Country/TerritoryMalta
CityValletta
Period22/02/2024/02/20

Fingerprint

Dive into the research topics of 'Machine learning to geographically enrich understudied sources: A conceptual approach'. Together they form a unique fingerprint.

Cite this