Detecting and correcting spelling errors in high-quality Dutch Wikipedia text

Merijn Beeksma, Maarten Van Gompel, Florian Kunneman, Louis Onrust, Bouke Regnerus, Dennis Vinke, Eduardo Brito, Christian Bauckhage, Rafet Sifa

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

For the CLIN28 shared task, we evaluated systems for spelling correction of high-quality text. The task focused on detecting and correcting spelling errors in Dutch Wikipedia pages. Three teams took part in the task. We compared the performance of their systems to that of a baseline system, the Dutch spelling corrector Valkuil. We evaluated the systems’ performance in terms of F1 score. Although two of the three participating systems performed well in the task of correcting spelling errors, error detection proved to be a challenging task, and without exception resulted in a high false positive rate. Therefore, the F1 score of the baseline was not improved upon. This paper elaborates on each team’s approach to the task, and discusses the overall challenges of correcting high-quality text.

Original languageEnglish
Pages (from-to)122-137
Number of pages16
JournalComputational Linguistics in the Netherlands Journal
Volume8
Publication statusPublished - 1 Dec 2018
Externally publishedYes
Event28th Computational Linguistics in the Netherlands Conference, CLIN 2018 - Nijmegen, Netherlands
Duration: 26 Jan 201826 Jan 2018

Fingerprint

Dive into the research topics of 'Detecting and correcting spelling errors in high-quality Dutch Wikipedia text'. Together they form a unique fingerprint.

Cite this