Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/134068
Citations
Scopus Web of Science® Altmetric
?
?
Type: Journal article
Title: Referring expression comprehension: a survey of methods and datasets
Author: Qiao, Y.
Deng, C.
Wu, Q.
Citation: IEEE Transactions on Multimedia, 2020; 23:4426-4440
Publisher: Institute of Electrical and Electronics Engineers
Issue Date: 2020
ISSN: 1520-9210
1941-0077
Statement of
Responsibility: 
Yanyuan Qiao, Chaorui Deng, Qi Wu
Abstract: Referring expression comprehension (REC) aims to localize a target object in an image described by a referring expression phrased in natural language. Different from the object detection task that queried object labels have been pre-defined, the REC problem only can observe the queries during the test. It is more challenging than a conventional computer vision problem. This task has attracted a lot of attention from both computer vision and natural language processing community, and several lines of work have been proposed, from CNN-RNN model, modular network to complex graph-based model. In this survey, we first examine the state-of-the-art by comparing modern approaches to the problem. We classify methods by their mechanism to encode the visual and textual modalities. In particular, we examine the common approach of joint embedding images and expressions to a common feature space. We also discuss modular architectures and graph-based models that interface with structured graph representation. In the second part of this survey, we review the datasets available for training and evaluating REC approaches. We then group results according to the datasets, backbone models, settings so that they can be fairly compared. Finally, we discuss promising future directions for this field, in particular the compositional referring expression comprehension that requires more reasoning steps to address.
Keywords: Referring expression; vision and language; attention mechanism; survey
Rights: © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
DOI: 10.1109/TMM.2020.3042066
Grant ID: http://purl.org/au-research/grants/arc/DE190100539
Published version: http://dx.doi.org/10.1109/tmm.2020.3042066
Appears in Collections:Computer Science publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.