Multi-grained Location Matching with Universal Structural Coordinate Encoder for Referring Expression Grounding

Yihong Zhao; Xiaojie Wang

Chinese︱Feedback︱Save this page

• Elaborating Academic Views 　　　　 • Exchanging Innovative Ideas
• Protecting Intellectual Properties 　　• Fast Sharing Science Papers

Sponsored by the Center for Science and Technology Development of the Ministry of Education
Supervised by Ministry of Education of the People's Republic of China

Home > Papers

Multi-grained Location Matching with Universal Structural Coordinate Encoder for Referring Expression Grounding

Yihong Zhao,Xiaojie Wang *

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876

*Correspondence author

#Submitted by

Subject:

Funding: none

Opened online:13 March 2023

Accepted by: none

Citation: Yihong Zhao,Xiaojie Wang.Multi-grained Location Matching with Universal Structural Coordinate Encoder for Referring Expression Grounding[OL]. [13 March 2023] http://en.paper.edu.cn/en_releasepaper/content/4759356

\justifying Referring expression grounding is a multimodal matching task involving language and vision, with the goal of locating the object in an image that is closest to the current referring expression(RE). The key to this task is not only to use the attribute of the subject in the text, but also to fully utilize the complex location information (absolute and relative location) in the image. Existing methods only encode location feature using information such as 5-dimensional coordinate and object area, which ignore some possible fine-grained clues, such as the overlap between two objects, which can be helpful in distinguishing. This paper proposes a general structure modeling approach based on mask information that is applicable to both absolute and relative location. By modeling at a fine-grained level, this paper achieves the use of the same structure for both types of location information, thereby improving modular training efficiency. Specifically, for any two objects in an image, the model extracts small-scale binary feature constructed by mask information, which correspond to the subject and object parts of the relationship, respectively. Then, it performs phrase-guided object attention on this feature and update the initial representation of the objects through multi-layer message passing to obtain cross-feature information. Conducting experiments on three of the most commonly used related datasets, results show that compared to previous methods, the model can improve the performance of modular-based referring expression grounding models in a generalizable manner, further achieving superior performance.

Keywords:Multimodality; Referring expression grounding; Location matching

For this paper

● PDF (0B)
● Revision 0 　　
● Print this paper
● Recommend this paper to a friend
● Add to my favorite list

Saved Papers

Please enter a name for this paper to be shown in your personalized Saved Papers list

Tags

Add yours

Related Papers

Statistics

PDF Downloaded	11
Bookmarked	0
Recommend	0
Comments	Array

Submit your papers

Alert Name:
Alerting to:
Authentication email will be sent to your email address in 24 hours
Frequency:
Email Message Format:	Plain text Graphical(HTML)

Complete the form below and we will recommend the selected titles to your friends on your behalf. * Indicates a required field.
Your name*:
Your email address*:
Recipient's name*:
Recipient's email address*:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find the page(s) useful.

Your name:
Your email address:
Recipient's name:
Recipient's email address:
(multiple recipient's names and email addresses should be separated with semicolons)
Your comments:	I thought you would find this page useful.

Disclaimer: This message was sent to your friend using the "Send it to a friend" facility on the Sciencepaper Online’ WWW site, http://www.paper.edu.cn/en. The Sciencepaper Online is not responsible for the content of this email, and anything said in this email does not necessarily reflect the Sciencepaper Online's views.

	Check out RSS, or use RSS reader to subscribe this item

Saved Papers