CLIMB – Curriculum Learning for Infant-inspired Model Building

Richard Diehl Martinez, Zébulon Goriely, Hope McGovern, Christopher Davis, Andrew Caines, Paula Buttery, Lisa Beinborn

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

We describe our team’s contribution to the STRICT-SMALL track of the BabyLM Challenge (Warstadt et al., 2023). The challenge requires training a language model from scratch using only a relatively small training dataset of ten million words. We experiment with three variants of cognitively-motivated curriculum learning and analyze their effect on the performance of the model on linguistic evaluation tasks. In the vocabulary curriculum, we analyze methods for constraining the vocabulary in the early stages of training to simulate cognitively more plausible learning curves. In the data curriculum experiments, we vary the order of the training instances based on i) infant-inspired expectations and ii) the learning behaviour of the model. In the objective curriculum, we explore different variations of combining the conventional masked language modelling task with a more coarse-grained word class prediction task to reinforce linguistic generalization capabilities. Our results did not yield consistent improvements over our own non-curriculum learning baseline across a range of linguistic benchmarks; however, we do find marginal gains on select tasks. Our analysis highlights key takeaways for specific combinations of tasks and settings which benefit from our proposed curricula. We moreover determine that careful selection of model architecture, and training hyper-parameters yield substantial improvements over the default baselines provided by the BabyLM challenge. Our code is publicly available at https://github.com/codebyzeb/CLIMB.

Original languageEnglish
Title of host publicationProceedings of the 27th Conference on Computational Natural Language Learning
Subtitle of host publicationVolume 2: The BabyLM Challenge
EditorsAlex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, Ryan Cotterell
PublisherAssociation for Computational Linguistics (ACL)
Pages112-127
Number of pages16
Volume2
ISBN (Electronic)9781952148026
DOIs
Publication statusPublished - 2023
EventBabyLM Challenge at the 27th Conference on Computational Natural Language Learning, CoNLL 2023 - Singapore, Singapore
Duration: 6 Dec 20237 Dec 2023

Conference

ConferenceBabyLM Challenge at the 27th Conference on Computational Natural Language Learning, CoNLL 2023
Country/TerritorySingapore
CitySingapore
Period6/12/237/12/23

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'CLIMB – Curriculum Learning for Infant-inspired Model Building'. Together they form a unique fingerprint.

Cite this