The empirical moment matrix and its application in computer vision
Permanent URL:
http://hdl.handle.net/2047/D20291232
Dy, Jennifer G. (Committee member)
Radke, Richard J. (Committee member)
Sznaier, Mario (Committee member)
Person re-ID is the problem of matching images of a pedestrian across cameras with no overlapping fields of view. It is one of the key tasks in surveillance video processing. Yet, due to the extremely large inter-class variances across different cameras (e.g., poses, illumination, viewpoints), the performance of the state-of-the-art person re-id algorithms is still far from ideal. In this thesis, we propose a novel descriptor, based on the on-manifold mean of a moment matrix (moM) and horizontal mean pooling, which can be used to approximate complex, non-Gaussian, distributions of the pixel features within a mid-sized local patch. To mitigate the gap between academic research and real-world applications, two large-scale public re-ID datasets are proposed and a systematic benchmark evaluation is established on both new datasets. Extensive experiments on five widely used public re-ID datasets and two newly collected datasets demonstrate that incorporating the proposed moM feature improves re-ID performance.
Different from general objection recognition tasks, fine-grained classification usually tries to distinguish objects at the sub-category level, such as different makes of cars or different species of a bird. The main challenge of this task is the relatively large inter-class and relatively small intra-class variations. The most successful approaches to this problem use deep convolutional neural network(CNN), where the top convolutional layers perform a local representation extraction step and the bottom fully connected layers perform an encoding step. In the case of fine-grain classification, bilinear pooling and Gaussian embedding have been shown as the best encoding options but at the price of an enormous feature dimensionality. Approximate compact pooling methods have been explored towards addressing this weakness. Additionally, recent results have shown that significant performance gains can be achieved by using matrix normalization to regularize the unstable higher order information. However, combining compact pooling with matrix normalization has not been explored until now. In this thesis, we unify the bilinear pooling layer and the global Gaussian embedding layer through the empirical moment matrix in a novel deep architecture, moment embedding network MoNet. In addition, we propose a novel sub-matrix square-root layer, which can be used to normalize the output of the convolution layer directly and mitigate the dimensionality problem with off-the-shelf compact pooling methods. Our experiments on three widely used fine- grained classification datasets illustrate that our proposed architecture MoNet can achieve similar or better performance than the state-of-art architectures . Furthermore, when combined with compact pooling techniques, it obtains comparable performance with encoded features but with only 4% of the dimensions.
empirical moment
feature encoding
feature extraction
fine-grained classification
person re-identification
Copyright restrictions may apply.