Benchmarking the Simplification of Dutch Municipal Text

Daniel Vlantis, Iva Gornishka, Shuai Wang

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Text simplification (TS) makes written information more accessible to all people, especially those with cognitive or language impairments. Despite much progress in TS due to advances in NLP technology, the bottleneck issue of lack of data for low-resource languages persists. Dutch is one of these languages that lack a monolingual simplification corpus. In this paper, we use English as a pivot language for the simplification of Dutch medical and municipal text. We experiment with augmenting training data and corpus choice for this pivot-based approach. We compare the results to a baseline and an end-to-end LLM approach using the GPT 3.5 Turbo model. Our evaluation shows that, while we can substantially improve the results of the pivot pipeline, the zero-shot end-to-end GPT-based simplification performs better on all metrics. Our work shows how an existing pivot-based pipeline can be improved for simplifying Dutch medical text. Moreover, we provide baselines for the comparison in the domain of Dutch municipal text and make our corresponding evaluation dataset publicly available.

Original languageEnglish
Title of host publicationProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherEuropean Language Resources Association (ELRA)
Pages2217-2226
Number of pages10
ISBN (Electronic)9782493814104
Publication statusPublished - 2024
EventJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, Italy
Duration: 20 May 202425 May 2024

Conference

ConferenceJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Country/TerritoryItaly
CityHybrid, Torino
Period20/05/2425/05/24

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association: CC BY-NC 4.0.

Keywords

  • Dutch municipal text
  • GPT
  • Large language models
  • Pivot-based text simplification

Fingerprint

Dive into the research topics of 'Benchmarking the Simplification of Dutch Municipal Text'. Together they form a unique fingerprint.

Cite this