ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraakEnglish 
    • EnglishEnglish
    • norsknorsk
  • Administration/UB
View Item 
  •   Home
  • Fakultet for naturvitenskap og teknologi
  • Institutt for fysikk og teknologi
  • Artikler, rapporter og annet (fysikk og teknologi)
  • View Item
  •   Home
  • Fakultet for naturvitenskap og teknologi
  • Institutt for fysikk og teknologi
  • Artikler, rapporter og annet (fysikk og teknologi)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

PSAIR: A Neuro-Symbolic Approach to Zero-Shot Visual Grounding

Permanent link
https://hdl.handle.net/10037/36786
DOI
https://doi.org/10.1109/IJCNN60899.2024.10649948
Thumbnail
View/Open
article.pdf (1.150Mb)
Accepted manuscript version (PDF)
Date
2024-09-09
Type
Chapter
Bokkapittel

Author
Pan, Yi; Zhang, Yujia; Kampffmeyer, Michael Christian; Zhao, Xiaoguang
Abstract
Supervised methods for Visual Grounding often require costly annotations of paired sentences and images with ground truth boxes. Recent zero-shot approaches to visual grounding such as ReCLIP and ChatRef aim to avoid the need for costly annotation of paired sentences and images with ground truth boxes. However, these approaches leverage an inflexible detect-then-reasoning paradigm, which leads to a notable semantic information loss. Additionally, these approaches are highly dependent on the definition of predefined keywords or the potentially inconsistent reasoning capabilities of Large Language Models (LLMs). To address these limitations, we propose a neuro-symbolic visual grounding method, PSAIR that incorporates two novel mechanisms, namely Parallel Scoring and Active Information Retrieval. PSAIR equips an LLM with external encapsulated reasoning functions, which the LLM can invoke, ensuring a flexible and stable reasoning process. The proposed Parallel Scoring mechanism is then used to reframe the sequential reasoning process found in prior approaches to facilitate robustness to noise in the detection process. Subsequently, the Active Information Retrieval mechanism is designed to address the loss of semantic information by having the ability to retrieve essential visual information, resulting in a new detect-reason-retrieve paradigm. These innovations result in superior performance and robustness across three public datasets compared to recent state-of-the-art zero-shot visual grounding methods.
Publisher
IEEE
Citation
Pan, Zhang, Kampffmeyer, Zhao: PSAIR: A Neuro-Symbolic Approach to Zero-Shot Visual Grounding. In: Hirose A, Ishibuchi H. 2024 International Joint Conference on Neural Networks (IJCNN), 2024. IEEE conference proceedings
Metadata
Show full item record
Collections
  • Artikler, rapporter og annet (fysikk og teknologi) [1058]
Copyright 2024 The Author(s)

Browse

Browse all of MuninCommunities & CollectionsAuthor listTitlesBy Issue DateBrowse this CollectionAuthor listTitlesBy Issue Date
Login

Statistics

View Usage Statistics
UiT

Munin is powered by DSpace

UiT The Arctic University of Norway
The University Library
uit.no/ub - munin@ub.uit.no

Accessibility statement (Norwegian only)