Multi-modal egocentric activity recognition using multi-kernel learning

Download
2020-04-28
Arabaci, Mehmet Ali
Ozkan, Fatih
Sürer, Elif
Jancovic, Peter
Temizel, Alptekin
Existing methods for egocentric activity recognition are mostly based on extracting motion characteristics from videos. On the other hand, ubiquity of wearable sensors allow acquisition of information from different sources. Although the increase in sensor diversity brings out the need for adaptive fusion, most of the studies use pre-determined weights for each source. In addition, there are a limited number of studies making use of optical, audio and wearable sensors. In this work, we propose a new framework that adaptively weighs the visual, audio and sensor features in relation to their discriminative abilities. For that purpose, multi-kernel learning (MKL) is used to fuse multi-modal features where the feature and kernel selection/weighing and recognition tasks are performed concurrently. Audio-visual information is used in association with the data acquired from wearable sensors since they hold information on different aspects of activities and help building better models. The proposed framework can be used with different modalities to improve the recognition accuracy and easily be extended with additional sensors. The results show that using multi-modal features with MKL outperforms the existing methods.
MULTIMEDIA TOOLS AND APPLICATIONS

Suggestions

Multi-modal Egocentric Activity Recognition using Audio-Visual Features
Arabacı, Mehmet Ali; Özkan, Fatih; Sürer, Elif; Jancovic, Peter; Temizel, Alptekin (2018-07-01)
Egocentric activity recognition in first-person videos has an increasing importance with a variety of applications such as lifelogging, summarization, assisted-living and activity tracking. Existing methods for this task are based on interpretation of various sensor information using pre-determined weights for each feature. In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (...
Comparison of histograms of oriented optical flow based action recognition methods
Erciş, Fırat; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2012)
In the task of human action recognition in uncontrolled video, motion features are used widely in order to achieve subject and appearence invariance. We implemented 3 Histograms of Oriented Optical Flow based method which have a common motion feature extraction phase. We compute an optical flow field over each frame of the video. Then those flow vectors are histogrammed due to angle values to represent each frame with a histogram. In order to capture local motions, The bounding box of the subject is divided...
Multi-modal Egocentric Activity Recognition Through Decision Fusion
Arabacı, Mehmet Ali; Temizel, Alptekin; Sürer, Elif; Department of Information Systems (2023-1-18)
The usage of wearable devices has rapidly grown in daily life with the development of sensor technologies. The most prominent information for wearable devices is collected from optics which produces videos from an egocentric perspective, called First Person Vision (FPV). FPV has different characteristics from third-person videos because of the large amounts of ego-motions and rapid changes in scenes. Vision-based methods designed for third-person videos where the camera is away from events and actors, canno...
Comparison of Cuboid and Tracklet Features for Action Recognition on Surveillance Videos
Bayram, Ulya; Ulusoy, İlkay; Cicekli, Nihan Kesim (2013-01-01)
For recognition of human actions in surveillance videos, action recognition methods in literature are analyzed and coherent feature extraction methods that are promising for success in such videos are identified. Based on local methods, most popular two feature extraction methods (Dollar's "cuboid" feature definition and Raptis and Soatto's "tracklet" feature definition) are tested and compared. Both methods were classified by different methods in their original applications. In order to obtain a more fair ...
Multi Camera Visual Surveillance for Motion Detection Occlusion Handling Tracking and Event Recognition
Akman, Oytun; Alatan, Abdullah Aydın; Çiloğlu, Tolga (null; 2008-10-05)
This paper presents novel approaches for background modeling, occlusion handling and event recognition by using multi-camera configurations that can be used to overcome the limitations of the single camera configurations. The main novelty in proposed background modeling approach is building multivariate Gaussians background model for each pixel of the reference camera by utilizing homography-related positions. Also, occlusion handling is achieved by generation of the top-view via trifocal tensors, as a resu...
Citation Formats
M. A. Arabaci, F. Ozkan, E. Sürer, P. Jancovic, and A. Temizel, “Multi-modal egocentric activity recognition using multi-kernel learning,” MULTIMEDIA TOOLS AND APPLICATIONS, pp. 0–0, 2020, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/31338.