CLAUSE-ATLAS: A Corpus of Narrative Information to Scale Up Computational Literary Analysis

Enrica Troiano, Piek Vossen

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

We introduce CLAUSE-ATLAS, a resource of XIX and XX century English novels annotated automatically. This corpus, which contains 41,715 labeled clauses, allows to study stories as sequences of eventive, subjective and contextual information. We use it to investigate if recent large language models, in particular gpt-3.5-turbo with 16k tokens of context, constitute promising tools to annotate large amounts of data for literary studies (we show that this is the case). Moreover, by analyzing the annotations so collected, we find that our clause-based approach to literature captures structural patterns within books, as well as qualitative differences between them.

Original languageEnglish
Title of host publicationProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherELRA and ICCL
Pages3283-3296
Number of pages14
ISBN (Electronic)9782493814104
Publication statusPublished - 2024
EventJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, Italy
Duration: 20 May 202425 May 2024

Conference

ConferenceJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Country/TerritoryItaly
CityHybrid, Torino
Period20/05/2425/05/24

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association: CC BY-NC 4.0.

Keywords

  • ChatGPT
  • events
  • literary resources
  • LLM-based annotation
  • narrative theory
  • subjectivity

Fingerprint

Dive into the research topics of 'CLAUSE-ATLAS: A Corpus of Narrative Information to Scale Up Computational Literary Analysis'. Together they form a unique fingerprint.

Cite this