Automating Gaze Target Annotation in Human-Robot Interaction

Linlin Cheng*, Koen V. Hindriks, Artem V. Belopolsky

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Identifying gaze targets in videos of human-robot interaction is useful for measuring engagement. In practice, this requires manually annotating for a fixed set of objects that a participant is looking at in a video, which is very time-consuming. To address this issue, we propose an annotation pipeline for automating this effort. In this work, we focus on videos in which the objects looked at do not move. As input for the proposed pipeline, we therefore only need to annotate object bounding boxes for the first frame of each video. The benefit, moreover, of manually annotating these frames is that we can also draw bounding boxes for objects outside of it, which enables estimating gaze targets in videos where not all objects are visible. A second issue that we address is that the models used for automating the pipeline annotate individual video frames. In practice, however, manual annotation is done at the event level for video segments instead of single frames. Therefore, we also introduce and investigate several variants of algorithms for aggregating frame-level to event-level annotations, which are used in the last step in our annotation pipeline.We compare two versions of our pipeline: one that uses a state-of-the-art gaze estimation model (GEM) and a second one using a state-of-the-art target detection model (TDM). Our results show that both versions successfully automate the annotation, but the GEM pipeline performs slightly (≈10%) better for videos where not all objects are visible. Analysis of our aggregation algorithm, moreover, shows that there is no need for manual video segmentation because a fixed time interval for segmentation yields very similar results. We conclude that the proposed pipeline can be used to automate almost all of the annotation effort.

Original languageEnglish
Title of host publication2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)
Subtitle of host publication[Proceedings]
PublisherIEEE Computer Society
Pages991-998
Number of pages8
ISBN (Electronic)9798350375022
ISBN (Print)9798350375039
DOIs
Publication statusPublished - 2024
Event33rd IEEE International Conference on Robot and Human Interactive Communication, ROMAN 2024 - Pasadena, United States
Duration: 26 Aug 202430 Aug 2024

Publication series

NameIEEE International Workshop on Robot and Human Communication (RO-MAN)
ISSN (Print)1944-9445
ISSN (Electronic)1944-9437

Conference

Conference33rd IEEE International Conference on Robot and Human Interactive Communication, ROMAN 2024
Country/TerritoryUnited States
CityPasadena
Period26/08/2430/08/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Fingerprint

Dive into the research topics of 'Automating Gaze Target Annotation in Human-Robot Interaction'. Together they form a unique fingerprint.

Cite this