ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraaknorsk 
    • EnglishEnglish
    • norsknorsk
  • Administrasjon/UB
Vis innførsel 
  •   Hjem
  • Fakultet for naturvitenskap og teknologi
  • Institutt for fysikk og teknologi
  • Artikler, rapporter og annet (fysikk og teknologi)
  • Vis innførsel
  •   Hjem
  • Fakultet for naturvitenskap og teknologi
  • Institutt for fysikk og teknologi
  • Artikler, rapporter og annet (fysikk og teknologi)
  • Vis innførsel
JavaScript is disabled for your browser. Some features of this site may not work without it.

PSAIR: A Neuro-Symbolic Approach to Zero-Shot Visual Grounding

Permanent lenke
https://hdl.handle.net/10037/36786
DOI
https://doi.org/10.1109/IJCNN60899.2024.10649948
Thumbnail
Åpne
article.pdf (1.150Mb)
Akseptert manusversjon (PDF)
Dato
2024-09-09
Type
Chapter
Bokkapittel

Forfatter
Pan, Yi; Zhang, Yujia; Kampffmeyer, Michael Christian; Zhao, Xiaoguang
Sammendrag
Supervised methods for Visual Grounding often require costly annotations of paired sentences and images with ground truth boxes. Recent zero-shot approaches to visual grounding such as ReCLIP and ChatRef aim to avoid the need for costly annotation of paired sentences and images with ground truth boxes. However, these approaches leverage an inflexible detect-then-reasoning paradigm, which leads to a notable semantic information loss. Additionally, these approaches are highly dependent on the definition of predefined keywords or the potentially inconsistent reasoning capabilities of Large Language Models (LLMs). To address these limitations, we propose a neuro-symbolic visual grounding method, PSAIR that incorporates two novel mechanisms, namely Parallel Scoring and Active Information Retrieval. PSAIR equips an LLM with external encapsulated reasoning functions, which the LLM can invoke, ensuring a flexible and stable reasoning process. The proposed Parallel Scoring mechanism is then used to reframe the sequential reasoning process found in prior approaches to facilitate robustness to noise in the detection process. Subsequently, the Active Information Retrieval mechanism is designed to address the loss of semantic information by having the ability to retrieve essential visual information, resulting in a new detect-reason-retrieve paradigm. These innovations result in superior performance and robustness across three public datasets compared to recent state-of-the-art zero-shot visual grounding methods.
Forlag
IEEE
Sitering
Pan, Zhang, Kampffmeyer, Zhao: PSAIR: A Neuro-Symbolic Approach to Zero-Shot Visual Grounding. In: Hirose A, Ishibuchi H. 2024 International Joint Conference on Neural Networks (IJCNN), 2024. IEEE conference proceedings
Metadata
Vis full innførsel
Samlinger
  • Artikler, rapporter og annet (fysikk og teknologi) [1057]
Copyright 2024 The Author(s)

Bla

Bla i hele MuninEnheter og samlingerForfatterlisteTittelDatoBla i denne samlingenForfatterlisteTittelDato
Logg inn

Statistikk

Antall visninger
UiT

Munin bygger på DSpace

UiT Norges Arktiske Universitet
Universitetsbiblioteket
uit.no/ub - munin@ub.uit.no

Tilgjengelighetserklæring