Towards a computational lexicon for Moroccan darija: Words, idioms, and constructions

Jamal Laoudi, Claire Bonial, Lucia Donatelli, Stephen Tratz, Clare Voss

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Copyright © LAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop.All rights reserved.We explore the challenges of building a computational lexicon for Moroccan Darija (MD), an Arabic dialect spoken by over 32 million people worldwide that only recently has begun appearing frequently in written form. We raise the question of what belongs in such a lexicon and start by describing our work building traditional word-level lexicon entries with their English translations. We then discuss challenges in translating idiomatic MD phrases and the creation of multi-word expression (MWE) lexicon entries whose meanings could not be fully derived from the individual words. Finally, we describe our preliminary exploration of constructions for inclusion in an MD constructicon, initially eliciting translations of established English constructions, and then shifting to document, when spontaneously offered, variant renderings of native MD counterparts.
Original languageEnglish
Title of host publicationLAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages74-85
ISBN (Electronic)9781948087513
Publication statusPublished - 2018
Externally publishedYes
EventJoint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWECxG 2018, in conjunction with the 27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
Duration: 25 Aug 201826 Aug 2018

Conference

ConferenceJoint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWECxG 2018, in conjunction with the 27th International Conference on Computational Linguistics, COLING 2018
Country/TerritoryUnited States
CitySanta Fe
Period25/08/1826/08/18

Fingerprint

Dive into the research topics of 'Towards a computational lexicon for Moroccan darija: Words, idioms, and constructions'. Together they form a unique fingerprint.

Cite this