Quantitative analysis of large amounts of journalistic texts using topic modelling

Research output: Contribution to JournalArticleAcademicpeer-review


The huge collections of news content which have become available through digital technologies both enable and warrant scientific inquiry, challenging journalism scholars to analyse unprecedented amounts of texts. We propose Latent Dirichlet Allocation (LDA) topic modelling as a tool to face this challenge. LDA is a cutting edge technique for content analysis, designed to automatically organize large archives of documents based on latent topics, measured as patterns of word (co-)occurrence. We explain how this technique works, how different choices by the researcher affect the results and how the results can be meaningfully interpreted. To demonstrate its usefulness for journalism research, we conducted a case study of the New York Times coverage of nuclear technology from 1945 to the present, partially replicating a study by Gamson and Modigliani. This shows that LDA is a useful tool for analysing trends and patterns in news content in large digital news archives relatively quickly.
Original languageEnglish
Pages (from-to)89-106
Number of pages18
JournalDigital Journalism
Issue number1
Publication statusPublished - 2016


Dive into the research topics of 'Quantitative analysis of large amounts of journalistic texts using topic modelling'. Together they form a unique fingerprint.

Cite this