Contextual information for Person Re-Identification on outdoor environements.
Export citation
Abstract
Person Re-Identification (ReID) is obtaining good results and is getting closer and closer to being ready for implementation in production scenarios. However, there are still improvements to be performed, and usually the performance of this task is affected by illumination or natural elements that could distort their images such as fog or dust, when the task is implemented in outdoor environments. In this work, we introduce a novel proposal for the inclusion of contextual information in a ReID re-ranking approach, to help to improve the effectiveness of this task in surveillance systems. Most of the previous research in this field usually make use only of the visual data contained in the images processed by ReID. Even the approaches that make use of some sort of context, is normally annotated context within the scope of the image itself, or the exploration of the relationships between the different images where the Id’s are found. We understand that there is a lot of contextual information available in these scenarios that are not being included and that might help to reduce the impact of these situations on the performance of the task. In the present document, we perform a complete analysis of the effect of the inclusion of this contextual information with the normally produced embeddings generated by several ReID models, processing it through an architecture inspired in siamese
neural networks, but with triplet loss. The neural network was trained using a novel dataset developed specifically for this task, which is annotated including this extra information. The dataset is composed of 34156 images from 3 different cameras of 501 labeled identities. Along with this data, each image includes 12 extra features with its specific contextual information.
This dataset of images was processed previously using three different ReID models to ensure that the results obtained when the information is included, are independent of the ReID approach taken as the base, which are: Triplet Network (TriNet), Multiple Granularity Network (MGN), and Multi-Level Factorization Net (MLFN). Each one produced 2048-dimensional embeddings. All of our proposed experiments achieved an improvement with respect to the original mAP generated from these three networks. Going from 86.53 to 94.9, from 84.94 to 93.11, and from 95.35 to 95.93 respectively for our dataset.