Abstract
Copyright © LAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop.All rights reserved.We explore the challenges of building a computational lexicon for Moroccan Darija (MD), an Arabic dialect spoken by over 32 million people worldwide that only recently has begun appearing frequently in written form. We raise the question of what belongs in such a lexicon and start by describing our work building traditional word-level lexicon entries with their English translations. We then discuss challenges in translating idiomatic MD phrases and the creation of multi-word expression (MWE) lexicon entries whose meanings could not be fully derived from the individual words. Finally, we describe our preliminary exploration of constructions for inclusion in an MD constructicon, initially eliciting translations of established English constructions, and then shifting to document, when spontaneously offered, variant renderings of native MD counterparts.
Original language | English |
---|---|
Title of host publication | LAW-MWE-CxG 2018 - Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, Proceedings of the Workshop |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 74-85 |
ISBN (Electronic) | 9781948087513 |
Publication status | Published - 2018 |
Externally published | Yes |
Event | Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWECxG 2018, in conjunction with the 27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States Duration: 25 Aug 2018 → 26 Aug 2018 |
Conference
Conference | Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, LAW-MWECxG 2018, in conjunction with the 27th International Conference on Computational Linguistics, COLING 2018 |
---|---|
Country/Territory | United States |
City | Santa Fe |
Period | 25/08/18 → 26/08/18 |