Detecting changing emotions in human speech by machine and humans

C.N. van der Wal, W. Kowalczyk

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker's emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one's own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user's emotional state in the envisioned application. © 2013 Springer Science+Business Media New York.
Original languageEnglish
JournalApplied Intelligence
DOIs
Publication statusPublished - 2013

Fingerprint

Principal component analysis
Labels
Experiments
Industry

Cite this

@article{bb0b0818199240e1bc78f6fc8f405a62,
title = "Detecting changing emotions in human speech by machine and humans",
abstract = "The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker's emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one's own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user's emotional state in the envisioned application. {\circledC} 2013 Springer Science+Business Media New York.",
author = "{van der Wal}, C.N. and W. Kowalczyk",
year = "2013",
doi = "10.1007/s10489-013-0449-1",
language = "English",
journal = "Applied Intelligence",
issn = "0924-669X",
publisher = "Springer Verlag",

}

Detecting changing emotions in human speech by machine and humans. / van der Wal, C.N.; Kowalczyk, W.

In: Applied Intelligence, 2013.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Detecting changing emotions in human speech by machine and humans

AU - van der Wal, C.N.

AU - Kowalczyk, W.

PY - 2013

Y1 - 2013

N2 - The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker's emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one's own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user's emotional state in the envisioned application. © 2013 Springer Science+Business Media New York.

AB - The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker's emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one's own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user's emotional state in the envisioned application. © 2013 Springer Science+Business Media New York.

U2 - 10.1007/s10489-013-0449-1

DO - 10.1007/s10489-013-0449-1

M3 - Article

JO - Applied Intelligence

JF - Applied Intelligence

SN - 0924-669X

ER -