Abstract
Identifying gaze targets in videos of human-robot interaction is useful for measuring engagement. In practice, this requires manually annotating for a fixed set of objects that a participant is looking at in a video, which is very time-consuming. To address this issue, we propose an annotation pipeline for automating this effort. In this work, we focus on videos in which the objects looked at do not move. As input for the proposed pipeline, we therefore only need to annotate object bounding boxes for the first frame of each video. The benefit, moreover, of manually annotating these frames is that we can also draw bounding boxes for objects outside of it, which enables estimating gaze targets in videos where not all objects are visible. A second issue that we address is that the models used for automating the pipeline annotate individual video frames. In practice, however, manual annotation is done at the event level for video segments instead of single frames. Therefore, we also introduce and investigate several variants of algorithms for aggregating frame-level to event-level annotations, which are used in the last step in our annotation pipeline.We compare two versions of our pipeline: one that uses a state-of-the-art gaze estimation model (GEM) and a second one using a state-of-the-art target detection model (TDM). Our results show that both versions successfully automate the annotation, but the GEM pipeline performs slightly (≈10%) better for videos where not all objects are visible. Analysis of our aggregation algorithm, moreover, shows that there is no need for manual video segmentation because a fixed time interval for segmentation yields very similar results. We conclude that the proposed pipeline can be used to automate almost all of the annotation effort.
Original language | English |
---|---|
Title of host publication | 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN) |
Subtitle of host publication | [Proceedings] |
Publisher | IEEE Computer Society |
Pages | 991-998 |
Number of pages | 8 |
ISBN (Electronic) | 9798350375022 |
ISBN (Print) | 9798350375039 |
DOIs | |
Publication status | Published - 2024 |
Event | 33rd IEEE International Conference on Robot and Human Interactive Communication, ROMAN 2024 - Pasadena, United States Duration: 26 Aug 2024 → 30 Aug 2024 |
Publication series
Name | IEEE International Workshop on Robot and Human Communication (RO-MAN) |
---|---|
ISSN (Print) | 1944-9445 |
ISSN (Electronic) | 1944-9437 |
Conference
Conference | 33rd IEEE International Conference on Robot and Human Interactive Communication, ROMAN 2024 |
---|---|
Country/Territory | United States |
City | Pasadena |
Period | 26/08/24 → 30/08/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.