Validating automated sentiment analysis of online cognitive behavioral therapy patient texts: An exploratory study

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Introduction: Sentiment analysis may be a useful technique to derive a user's emotional state from free text input, allowing for more empathic automated feedback in online cognitive behavioral therapy (iCBT) interventions for psychological disorders such as depression. As guided iCBT is considered more effective than unguided iCBT, such automated feedback may help close the gap between the two. The accuracy of automated sentiment analysis is domain dependent, and it is unclear how well the technology is applicable to iCBT. This paper presents an empirical study in which automated sentiment analysis by an algorithm for the Dutch language is validated against human judgment. Methods: A total of 493 iCBT user texts were evaluated on overall sentiment and the presence of five specific emotions by an algorithm, and by 52 psychology students who evaluated 75 randomly selected texts each, providing about eight human evaluations per text. Inter-rater agreement (IRR) between algorithm and humans, and humans among each other, was analyzed by calculating the intra-class correlation under a numerical interpretation of the data, and Cohen's kappa, and Krippendorff's alpha under a categorical interpretation. Results: All analyses indicated moderate agreement between the algorithm and average human judgment with respect to evaluating overall sentiment, and low agreement for the specific emotions. Somewhat surprisingly, the same was the case for the IRR among human judges, which means that the algorithm performed about as well as a randomly selected human judge. Thus, considering average human judgment as a benchmark for the applicability of automated sentiment analysis, the technique can be considered for practical application. Discussion/Conclusion: The low human-human agreement on the presence of emotions may be due to the nature of the texts, it may simply be difficult for humans to agree on the presence of the selected emotions, or perhaps trained therapists would have reached more consensus. Future research may focus on validating the algorithm against a more solid benchmark, on applying the algorithm in an application in which empathic feedback is provided, for example, by an embodied conversational agent, or on improving the algorithm for the iCBT domain with a bottom-up machine learning approach.

Original languageEnglish
Article number1065
Pages (from-to)1-12
Number of pages12
JournalFrontiers in Psychology
Volume10
Issue numberMAY
DOIs
Publication statusPublished - 14 May 2019

Fingerprint

Cognitive Therapy
Emotions
Benchmarking
Psychology
Consensus
Language
Depression
Students
Technology

Keywords

  • Automated support
  • Benchmarking and validation
  • Cognitive behavioral therapy (CBT)
  • Depression
  • E-mental health
  • Embodied conversational agent (ECA)
  • Internet interventions
  • Sentiment analysis and opinion mining

Cite this

@article{0add16272e1d4595a79c79759e3d38f9,
title = "Validating automated sentiment analysis of online cognitive behavioral therapy patient texts: An exploratory study",
abstract = "Introduction: Sentiment analysis may be a useful technique to derive a user's emotional state from free text input, allowing for more empathic automated feedback in online cognitive behavioral therapy (iCBT) interventions for psychological disorders such as depression. As guided iCBT is considered more effective than unguided iCBT, such automated feedback may help close the gap between the two. The accuracy of automated sentiment analysis is domain dependent, and it is unclear how well the technology is applicable to iCBT. This paper presents an empirical study in which automated sentiment analysis by an algorithm for the Dutch language is validated against human judgment. Methods: A total of 493 iCBT user texts were evaluated on overall sentiment and the presence of five specific emotions by an algorithm, and by 52 psychology students who evaluated 75 randomly selected texts each, providing about eight human evaluations per text. Inter-rater agreement (IRR) between algorithm and humans, and humans among each other, was analyzed by calculating the intra-class correlation under a numerical interpretation of the data, and Cohen's kappa, and Krippendorff's alpha under a categorical interpretation. Results: All analyses indicated moderate agreement between the algorithm and average human judgment with respect to evaluating overall sentiment, and low agreement for the specific emotions. Somewhat surprisingly, the same was the case for the IRR among human judges, which means that the algorithm performed about as well as a randomly selected human judge. Thus, considering average human judgment as a benchmark for the applicability of automated sentiment analysis, the technique can be considered for practical application. Discussion/Conclusion: The low human-human agreement on the presence of emotions may be due to the nature of the texts, it may simply be difficult for humans to agree on the presence of the selected emotions, or perhaps trained therapists would have reached more consensus. Future research may focus on validating the algorithm against a more solid benchmark, on applying the algorithm in an application in which empathic feedback is provided, for example, by an embodied conversational agent, or on improving the algorithm for the iCBT domain with a bottom-up machine learning approach.",
keywords = "Automated support, Benchmarking and validation, Cognitive behavioral therapy (CBT), Depression, E-mental health, Embodied conversational agent (ECA), Internet interventions, Sentiment analysis and opinion mining",
author = "Simon Provoost and Jeroen Ruwaard and {van Breda}, Ward and Heleen Riper and Tibor Bosse",
year = "2019",
month = "5",
day = "14",
doi = "10.3389/fpsyg.2019.01065",
language = "English",
volume = "10",
pages = "1--12",
journal = "Frontiers in Psychology",
issn = "1664-1078",
publisher = "Frontiers Media",
number = "MAY",

}

Validating automated sentiment analysis of online cognitive behavioral therapy patient texts : An exploratory study. / Provoost, Simon; Ruwaard, Jeroen; van Breda, Ward; Riper, Heleen; Bosse, Tibor.

In: Frontiers in Psychology, Vol. 10, No. MAY, 1065, 14.05.2019, p. 1-12.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Validating automated sentiment analysis of online cognitive behavioral therapy patient texts

T2 - An exploratory study

AU - Provoost, Simon

AU - Ruwaard, Jeroen

AU - van Breda, Ward

AU - Riper, Heleen

AU - Bosse, Tibor

PY - 2019/5/14

Y1 - 2019/5/14

N2 - Introduction: Sentiment analysis may be a useful technique to derive a user's emotional state from free text input, allowing for more empathic automated feedback in online cognitive behavioral therapy (iCBT) interventions for psychological disorders such as depression. As guided iCBT is considered more effective than unguided iCBT, such automated feedback may help close the gap between the two. The accuracy of automated sentiment analysis is domain dependent, and it is unclear how well the technology is applicable to iCBT. This paper presents an empirical study in which automated sentiment analysis by an algorithm for the Dutch language is validated against human judgment. Methods: A total of 493 iCBT user texts were evaluated on overall sentiment and the presence of five specific emotions by an algorithm, and by 52 psychology students who evaluated 75 randomly selected texts each, providing about eight human evaluations per text. Inter-rater agreement (IRR) between algorithm and humans, and humans among each other, was analyzed by calculating the intra-class correlation under a numerical interpretation of the data, and Cohen's kappa, and Krippendorff's alpha under a categorical interpretation. Results: All analyses indicated moderate agreement between the algorithm and average human judgment with respect to evaluating overall sentiment, and low agreement for the specific emotions. Somewhat surprisingly, the same was the case for the IRR among human judges, which means that the algorithm performed about as well as a randomly selected human judge. Thus, considering average human judgment as a benchmark for the applicability of automated sentiment analysis, the technique can be considered for practical application. Discussion/Conclusion: The low human-human agreement on the presence of emotions may be due to the nature of the texts, it may simply be difficult for humans to agree on the presence of the selected emotions, or perhaps trained therapists would have reached more consensus. Future research may focus on validating the algorithm against a more solid benchmark, on applying the algorithm in an application in which empathic feedback is provided, for example, by an embodied conversational agent, or on improving the algorithm for the iCBT domain with a bottom-up machine learning approach.

AB - Introduction: Sentiment analysis may be a useful technique to derive a user's emotional state from free text input, allowing for more empathic automated feedback in online cognitive behavioral therapy (iCBT) interventions for psychological disorders such as depression. As guided iCBT is considered more effective than unguided iCBT, such automated feedback may help close the gap between the two. The accuracy of automated sentiment analysis is domain dependent, and it is unclear how well the technology is applicable to iCBT. This paper presents an empirical study in which automated sentiment analysis by an algorithm for the Dutch language is validated against human judgment. Methods: A total of 493 iCBT user texts were evaluated on overall sentiment and the presence of five specific emotions by an algorithm, and by 52 psychology students who evaluated 75 randomly selected texts each, providing about eight human evaluations per text. Inter-rater agreement (IRR) between algorithm and humans, and humans among each other, was analyzed by calculating the intra-class correlation under a numerical interpretation of the data, and Cohen's kappa, and Krippendorff's alpha under a categorical interpretation. Results: All analyses indicated moderate agreement between the algorithm and average human judgment with respect to evaluating overall sentiment, and low agreement for the specific emotions. Somewhat surprisingly, the same was the case for the IRR among human judges, which means that the algorithm performed about as well as a randomly selected human judge. Thus, considering average human judgment as a benchmark for the applicability of automated sentiment analysis, the technique can be considered for practical application. Discussion/Conclusion: The low human-human agreement on the presence of emotions may be due to the nature of the texts, it may simply be difficult for humans to agree on the presence of the selected emotions, or perhaps trained therapists would have reached more consensus. Future research may focus on validating the algorithm against a more solid benchmark, on applying the algorithm in an application in which empathic feedback is provided, for example, by an embodied conversational agent, or on improving the algorithm for the iCBT domain with a bottom-up machine learning approach.

KW - Automated support

KW - Benchmarking and validation

KW - Cognitive behavioral therapy (CBT)

KW - Depression

KW - E-mental health

KW - Embodied conversational agent (ECA)

KW - Internet interventions

KW - Sentiment analysis and opinion mining

UR - http://www.scopus.com/inward/record.url?scp=85068465654&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068465654&partnerID=8YFLogxK

U2 - 10.3389/fpsyg.2019.01065

DO - 10.3389/fpsyg.2019.01065

M3 - Article

VL - 10

SP - 1

EP - 12

JO - Frontiers in Psychology

JF - Frontiers in Psychology

SN - 1664-1078

IS - MAY

M1 - 1065

ER -