Vis enkel innførsel

dc.contributor.authorPan, Yi
dc.contributor.authorZhang, Yujia
dc.contributor.authorKampffmeyer, Michael Christian
dc.contributor.authorZhao, Xiaoguang
dc.date.accessioned2025-03-26T14:02:13Z
dc.date.available2025-03-26T14:02:13Z
dc.date.issued2024-09-09
dc.description.abstractSupervised methods for Visual Grounding often require costly annotations of paired sentences and images with ground truth boxes. Recent zero-shot approaches to visual grounding such as ReCLIP and ChatRef aim to avoid the need for costly annotation of paired sentences and images with ground truth boxes. However, these approaches leverage an inflexible detect-then-reasoning paradigm, which leads to a notable semantic information loss. Additionally, these approaches are highly dependent on the definition of predefined keywords or the potentially inconsistent reasoning capabilities of Large Language Models (LLMs). To address these limitations, we propose a neuro-symbolic visual grounding method, PSAIR that incorporates two novel mechanisms, namely Parallel Scoring and Active Information Retrieval. PSAIR equips an LLM with external encapsulated reasoning functions, which the LLM can invoke, ensuring a flexible and stable reasoning process. The proposed Parallel Scoring mechanism is then used to reframe the sequential reasoning process found in prior approaches to facilitate robustness to noise in the detection process. Subsequently, the Active Information Retrieval mechanism is designed to address the loss of semantic information by having the ability to retrieve essential visual information, resulting in a new detect-reason-retrieve paradigm. These innovations result in superior performance and robustness across three public datasets compared to recent state-of-the-art zero-shot visual grounding methods.en_US
dc.identifier.citationPan, Zhang, Kampffmeyer, Zhao: PSAIR: A Neuro-Symbolic Approach to Zero-Shot Visual Grounding. In: Hirose A, Ishibuchi H. 2024 International Joint Conference on Neural Networks (IJCNN), 2024. IEEE conference proceedingsen_US
dc.identifier.cristinIDFRIDAID 2363557
dc.identifier.doi10.1109/IJCNN60899.2024.10649948
dc.identifier.isbn979-8-3503-5931-2
dc.identifier.issn2161-4393
dc.identifier.issn2161-4407
dc.identifier.urihttps://hdl.handle.net/10037/36786
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.relation.projectIDNorges forskningsråd: 309439en_US
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2024 The Author(s)en_US
dc.titlePSAIR: A Neuro-Symbolic Approach to Zero-Shot Visual Groundingen_US
dc.type.versionacceptedVersionen_US
dc.typeChapteren_US
dc.typeBokkapittelen_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel