Skip to main navigation Skip to search Skip to main content

The R package sentometrics to compute, aggregate, and predict with textual sentiment

  • David Ardia
  • , Keven Bluteau
  • , Samuel Borms*
  • , Kris Boudt
  • *Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

We provide a hands-on introduction to optimized textual sentiment indexation using the R package sentometrics. Textual sentiment analysis is increasingly used to unlock the potential information value of textual data. The sentometrics package implements an intuitive framework to efficiently compute sentiment scores of numerous texts, to aggregate the scores into multiple time series, and to use these time series to predict other variables. The workflow of the package is illustrated with a built-in corpus of news articles from two major U.S. journals to forecast the CBOE Volatility Index.

Original languageEnglish
Pages (from-to)1-40
Number of pages40
JournalJournal of Statistical Software
Volume99
Early online date18 Aug 2021
DOIs
Publication statusPublished - 2021

Bibliographical note

Publisher Copyright:
© 2021, American Statistical Association. All rights reserved.

Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.

Funding

We thank the Associate Editors (Toby Hocking and Torsten Hothorn) and three anonymous referees, Andres Algaba (package contributor), Nabil Bouamara, Peter Carl, Leopoldo Catania, Thomas Chuffart, Dries Cornilly, Serge Darolles, William Doehler, Arnaud Dufays, Matteo Ghilotti, Kurt Hornik, Siem Jan Koopman, Julie Marquis, Linda Mhalla, Brian Pe-terson, Laura Rossetti, Tobias Setz, Majeed Siman, Stefan Theussl, Wouter Torsin, Jeroen Van Pelt (package contributor ), Marieke Vantomme, and participants at the CFE (London, 2017), eRum (Budapest, 2018), R/Finance (Chicago, 2018), SwissText (Winterthur, 2018), SoFiE (Brussels, 2018), “Data Science in Finance with R” (Vienna, 2018), “New Challenges for Central Bank Communication” (Brussels, 2018), (EC)ˆ2 (Roma, 2018), and useR! (Toulouse, 2019) conferences for helpful comments. We acknowledge Google Summer of Code 2017 and 2019 (https://summerofcode.withgoogle.com), Innoviris (https:// innoviris.brussels), IVADO (https://ivado.ca), and the Swiss National Science Foundation (http://www.snf.ch, grants #179281 and #191730) for their financial support.

FundersFunder number
Marieke Vantomme
European Commission
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung179281, 191730
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Conseil Français de l'Énergie
Institut de Valorisation des Données

    UN SDGs

    This output contributes to the following UN Sustainable Development Goals (SDGs)

    1. SDG 4 - Quality Education
      SDG 4 Quality Education

    Keywords

    • Aggregation
    • Penalized regression
    • Prediction
    • R
    • Sentometrics
    • Textual sentiment
    • Time series

    Fingerprint

    Dive into the research topics of 'The R package sentometrics to compute, aggregate, and predict with textual sentiment'. Together they form a unique fingerprint.

    Cite this