Abstract
We provide a hands-on introduction to optimized textual sentiment indexation using the R package sentometrics. Textual sentiment analysis is increasingly used to unlock the potential information value of textual data. The sentometrics package implements an intuitive framework to efficiently compute sentiment scores of numerous texts, to aggregate the scores into multiple time series, and to use these time series to predict other variables. The workflow of the package is illustrated with a built-in corpus of news articles from two major U.S. journals to forecast the CBOE Volatility Index.
Original language | English |
---|---|
Pages (from-to) | 1-40 |
Number of pages | 40 |
Journal | Journal of Statistical Software |
Volume | 99 |
DOIs | |
Publication status | Published - 2021 |
Bibliographical note
Funding Information:We thank the Associate Editors (Toby Hocking and Torsten Hothorn) and three anonymous referees, Andres Algaba (package contributor), Nabil Bouamara, Peter Carl, Leopoldo Catania, Thomas Chuffart, Dries Cornilly, Serge Darolles, William Doehler, Arnaud Dufays, Matteo Ghilotti, Kurt Hornik, Siem Jan Koopman, Julie Marquis, Linda Mhalla, Brian Pe-terson, Laura Rossetti, Tobias Setz, Majeed Siman, Stefan Theussl, Wouter Torsin, Jeroen Van Pelt (package contributor ), Marieke Vantomme, and participants at the CFE (London, 2017), eRum (Budapest, 2018), R/Finance (Chicago, 2018), SwissText (Winterthur, 2018), SoFiE (Brussels, 2018), “Data Science in Finance with R” (Vienna, 2018), “New Challenges for Central Bank Communication” (Brussels, 2018), (EC)ˆ2 (Roma, 2018), and useR! (Toulouse, 2019) conferences for helpful comments. We acknowledge Google Summer of Code 2017 and 2019 (https://summerofcode.withgoogle.com), Innoviris (https:// innoviris.brussels), IVADO (https://ivado.ca), and the Swiss National Science Foundation (http://www.snf.ch, grants #179281 and #191730) for their financial support.
Funding Information:
We thank the Associate Editors (Toby Hocking and Torsten Hothorn) and three anony-mous referees, Andres Algaba (package contributor), Nabil Bouamara, Peter Carl, Leopoldo Catania, Thomas Chuffart, Dries Cornilly, Serge Darolles, William Doehler, Arnaud Dufays, Matteo Ghilotti, Kurt Hornik, Siem Jan Koopman, Julie Marquis, Linda Mhalla, Brian Pe-terson, Laura Rossetti, Tobias Setz, Majeed Siman, Stefan Theussl, Wouter Torsin, Jeroen Van Pelt (package contributor), Marieke Vantomme, and participants at the CFE (Lon-don, 2017), eRum (Budapest, 2018), R/Finance (Chicago, 2018), SwissText (Winterthur, 2018), SoFiE (Brussels, 2018), ?Data Science in Finance with R? (Vienna, 2018), ?New Challenges for Central Bank Communication? (Brussels, 2018), (EC)?2 (Roma, 2018), and useR! (Toulouse, 2019) conferences for helpful comments. We acknowledge Google Summer of Code 2017 and 2019 (https://summerofcode.withgoogle.com), Innoviris (https:// innoviris.brussels), IVADO (https://ivado.ca), and the Swiss National Science Foundation (http://www.snf.ch, grants #179281 and #191730) for their financial support.
Publisher Copyright:
© 2021, American Statistical Association. All rights reserved.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
Keywords
- Aggregation
- Penalized regression
- Prediction
- R
- Sentometrics
- Textual sentiment
- Time series