Language Models as Measurement Tools: Using Instruction-Based Models to Increase Validity, Robustness and Data Efficiency

Moritz Laurer

Research output: PhD ThesisPhD-Thesis - Research and graduation internal

865 Downloads (Pure)

Abstract

From millions of social media posts, to decades of legal text - more and more relevant information is hidden in digital text corpora that are too large for manual analyses. The key promise of machine learning is to automate parts of the manual analysis process. One popular method is supervised machine learning for text classification, where a model is trained on examples of manually categorized texts and learns to identify these categories in new texts. Computational social scientists have used this method to create measurements of concepts such as emotions, topics or stances at scale. While measurement with supervised machine learning is established in the social science literature, there are important limitations that reduce the usefulness of established methods for many practical applications. First, these methods require large amounts of balanced training data to work well. Researchers, however, often only have limited resources for creating training data and need to tailor new data to each new research question. Second, older algorithms struggle with multilingual data. Researchers, however, need measurements that are equally valid for different cultures and languages. Third, they are susceptible to learning shortcuts and biased patterns from their training data, reducing the validity of measurements across social groups. Fourth, they can be difficult to use, making them only accessible to specialised researchers. This thesis demonstrates how a recent innovation from the natural language processing literature can address these limitations: instruction-based language models. Chapter 2 shows how this type of model can reduce the required training data by a factor of ten compared to previous algorithms, while achieving the same level of performance across eight tasks. Chapter 3 demonstrates how these models require less than 2000 examples in two languages to create valid measurements across eight other languages and ten other countries. Chapter 4 shows how these models are more robust against group-specific biases. Their average test-set performance only decreases marginally when trained on biased data in experiments across nine groups from four datasets. Chapter 5 explains how these models can be universal classifiers that can learn any number of classification tasks simultaneously in tests across 33 datasets with 389 classes.
Original languageEnglish
QualificationPhD
Awarding Institution
  • Vrije Universiteit Amsterdam
Supervisors/Advisors
  • van Atteveldt, Wouter, Supervisor
  • Welbers, Kasper, Co-supervisor
  • Casas Salleras, A., Co-supervisor
Award date2 Oct 2024
DOIs
Publication statusPublished - 2 Oct 2024

Keywords

  • transfer learning
  • computational social sciences
  • natural language processing
  • supervised machine learning
  • text classification
  • natural language inference
  • text as data
  • language models
  • validity
  • bias

Fingerprint

Dive into the research topics of 'Language Models as Measurement Tools: Using Instruction-Based Models to Increase Validity, Robustness and Data Efficiency'. Together they form a unique fingerprint.

Cite this