The r package sentometrics to compute, aggregate, and predict with textual sentiment

David Ardia, Keven Bluteau, Samuel Borms*, Kris Boudt

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

We provide a hands-on introduction to optimized textual sentiment indexation using the R package sentometrics. Textual sentiment analysis is increasingly used to unlock the potential information value of textual data. The sentometrics package implements an intuitive framework to efficiently compute sentiment scores of numerous texts, to aggregate the scores into multiple time series, and to use these time series to predict other variables. The workflow of the package is illustrated with a built-in corpus of news articles from two major U.S. journals to forecast the CBOE Volatility Index.

Original languageEnglish
Pages (from-to)1-40
Number of pages40
JournalJournal of Statistical Software
Volume99
DOIs
Publication statusPublished - 2021

Bibliographical note

Funding Information:
We thank the Associate Editors (Toby Hocking and Torsten Hothorn) and three anonymous referees, Andres Algaba (package contributor), Nabil Bouamara, Peter Carl, Leopoldo Catania, Thomas Chuffart, Dries Cornilly, Serge Darolles, William Doehler, Arnaud Dufays, Matteo Ghilotti, Kurt Hornik, Siem Jan Koopman, Julie Marquis, Linda Mhalla, Brian Pe-terson, Laura Rossetti, Tobias Setz, Majeed Siman, Stefan Theussl, Wouter Torsin, Jeroen Van Pelt (package contributor ), Marieke Vantomme, and participants at the CFE (London, 2017), eRum (Budapest, 2018), R/Finance (Chicago, 2018), SwissText (Winterthur, 2018), SoFiE (Brussels, 2018), “Data Science in Finance with R” (Vienna, 2018), “New Challenges for Central Bank Communication” (Brussels, 2018), (EC)ˆ2 (Roma, 2018), and useR! (Toulouse, 2019) conferences for helpful comments. We acknowledge Google Summer of Code 2017 and 2019 (https://summerofcode.withgoogle.com), Innoviris (https:// innoviris.brussels), IVADO (https://ivado.ca), and the Swiss National Science Foundation (http://www.snf.ch, grants #179281 and #191730) for their financial support.

Funding Information:
We thank the Associate Editors (Toby Hocking and Torsten Hothorn) and three anony-mous referees, Andres Algaba (package contributor), Nabil Bouamara, Peter Carl, Leopoldo Catania, Thomas Chuffart, Dries Cornilly, Serge Darolles, William Doehler, Arnaud Dufays, Matteo Ghilotti, Kurt Hornik, Siem Jan Koopman, Julie Marquis, Linda Mhalla, Brian Pe-terson, Laura Rossetti, Tobias Setz, Majeed Siman, Stefan Theussl, Wouter Torsin, Jeroen Van Pelt (package contributor), Marieke Vantomme, and participants at the CFE (Lon-don, 2017), eRum (Budapest, 2018), R/Finance (Chicago, 2018), SwissText (Winterthur, 2018), SoFiE (Brussels, 2018), ?Data Science in Finance with R? (Vienna, 2018), ?New Challenges for Central Bank Communication? (Brussels, 2018), (EC)?2 (Roma, 2018), and useR! (Toulouse, 2019) conferences for helpful comments. We acknowledge Google Summer of Code 2017 and 2019 (https://summerofcode.withgoogle.com), Innoviris (https:// innoviris.brussels), IVADO (https://ivado.ca), and the Swiss National Science Foundation (http://www.snf.ch, grants #179281 and #191730) for their financial support.

Publisher Copyright:
© 2021, American Statistical Association. All rights reserved.

Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.

Keywords

  • Aggregation
  • Penalized regression
  • Prediction
  • R
  • Sentometrics
  • Textual sentiment
  • Time series

Fingerprint

Dive into the research topics of 'The r package sentometrics to compute, aggregate, and predict with textual sentiment'. Together they form a unique fingerprint.

Cite this