Abstract
In today's digital landscape, users face critical challenges related to information consumption. With the overwhelming availability of (mis)information online, it can be difficult for users to discern credible sources from unreliable ones. Moreover, while a wide range of opinions and perspectives are more accessible than ever before, the algorithms that drive content delivery often prioritize engagement over accuracy, leading to the reinforcement of existing biases. The rise of generative AI technologies further complicates and intensifies these issues.
Natural Language Understanding (NLU) technologies could play a crucial role in guiding users through this increasingly complex information landscape. NLU focuses on developing language models that can understand and extract meaning from human language. Such models can be leveraged in applications designed to help users navigate and interpret information more effectively. In this context, the Natural Language Inference (NLI) task is particularly relevant, which involves predicting whether the truth of one sentence (the hypothesis) entails, contradicts, or is neutral with respect to another sentence (the premise). This process requires a sophisticated understanding of the logical relationships between propositions and the ability to evaluate how different perspectives on these propositions interact. Effective NLI models could therefore aid in the interpretation and evaluation of diverse and potentially conflicting sources.
However, for language models to be truly effective, they must be trained and evaluated on high-quality language resources. While impressive advancements are made with increasingly proficient models, the focus on developing and refining them can overshadow the need to evaluate them on diverse, representative, and challenging datasets. This thesis evaluates the current state of language resources and identifies ways to enhance their effectiveness, especially in modeling natural online debates. It addresses various aspects of quality and representativeness, with a specific focus on capturing nuances in both textual meaning and human interpretation. The work emphasizes the need for perspective-aware resources that preserve the authenticity of language in datasets, ensuring that models are better equipped for real-world applications. The findings and recommendations presented aim to guide future advancements in NLU research, particularly in the construction of more representative datasets.
Original language | English |
---|---|
Qualification | PhD |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 1 Nov 2024 |
DOIs | |
Publication status | Published - 1 Nov 2024 |
Keywords
- natural language understanding
- language models
- language resources
- natural language inference
- perspectives
- subjectivity
- negation
- interoperability
- human interpretation
- crowdtruth