Predicting Gaze from Egocentric Social Interaction Videos and IMU Data

Gaze prediction in egocentric videos is a fairly new research topic, which might have several applications for assistive technology (e.g., supporting blind people in their daily interactions), security (e.g., attention tracking in risky work environments), education (e.g., augmented / mixed reality training simulators, immersive games) and so forth. Egocentric gaze is typically estimated from video while few works attempt to use inertial measurement unit (IMU) data, a sensor modality often available in wearable devices (e.g., augmented reality headsets). Instead, in this paper, we examine whether joint learning of egocentric video and corresponding IMU data can improve the first-person gaze prediction compared to using these modalities separately. In this respect, we propose a multimodal network and evaluate it on several unconstrained social interaction scenarios captured by a first-person perspective. The proposed multimodal network achieves better results compared to unimodal methods as well as several (multimodal) baselines, showing that using egocentric video together with IMU data can boost the first-person gaze estimation performance.

(2021). Predicting Gaze from Egocentric Social Interaction Videos and IMU Data . Retrieved from https://hdl.handle.net/10446/260628

Predicting Gaze from Egocentric Social Interaction Videos and IMU Data

Thakur, Sanket Kumar;Beyan, Cigdem;Morerio, Pietro;Del Bue, Alessio

2021-01-01

Abstract

Gaze prediction in egocentric videos is a fairly new research topic, which might have several applications for assistive technology (e.g., supporting blind people in their daily interactions), security (e.g., attention tracking in risky work environments), education (e.g., augmented / mixed reality training simulators, immersive games) and so forth. Egocentric gaze is typically estimated from video while few works attempt to use inertial measurement unit (IMU) data, a sensor modality often available in wearable devices (e.g., augmented reality headsets). Instead, in this paper, we examine whether joint learning of egocentric video and corresponding IMU data can improve the first-person gaze prediction compared to using these modalities separately. In this respect, we propose a multimodal network and evaluate it on several unconstrained social interaction scenarios captured by a first-person perspective. The proposed multimodal network achieves better results compared to unimodal methods as well as several (multimodal) baselines, showing that using egocentric video together with IMU data can boost the first-person gaze estimation performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
			2021
		
	Tutti gli autori
	
			Thakur, Sanket Kumar; Beyan, Cigdem; Morerio, Pietro; Del Bue, Alessio
		
	Nelle collezioni:
	
			1.4.01 Contributi in atti di convegno - Conference presentations

File allegato/i alla scheda:

File	Dimensione del file	Formato
IC21_Predicting Gaze from Egocentric Social Interaction.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 2.53 MB Formato Adobe PDF Visualizza/Apri	2.53 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/260628

Citazioni

8

ND

social impact