The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish, Arabic, and Dutch, covering 80 topics such as Culture, Politics, and News. We thoroughly analyze how toxicity spikes within different communities in relation to specific topics. We observe consistent patterns of increased toxicity across languages for certain topics, while also noting significant variations within specific language communities.

Original languageEnglish
Title of host publicationProceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024
EditorsRitesh Kumar, Atul Kr. Ojha, Atul Kr. Ojha, Shervin Malmasi, Bharathi Raja Chakravarthi, Bornini Lahiri, Siddharth Singh, Shyam Ratan
PublisherACL Anthology
Pages1-11
Number of pages11
ISBN (Electronic)9782493814470
Publication statusPublished - 2024
Event4th Workshop on Threat, Aggression and Cyberbullying, TRAC 2024 - Torino, Italy
Duration: 20 May 2024 → …

Conference

Conference4th Workshop on Threat, Aggression and Cyberbullying, TRAC 2024
Country/TerritoryItaly
CityTorino
Period20/05/24 → …

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

Keywords

  • Cross-Lingual Analysis
  • Cross-Topic Analysis
  • Reddit
  • Toxic Language

Fingerprint

Dive into the research topics of 'The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages'. Together they form a unique fingerprint.

Cite this