• ConnNet: A Long-Range Relation-Aware Pixel-Connectivity Network for Salient Segmentation 

      Kampffmeyer, Michael C.; Dong, Nanqing; Liang, Xiaodan; Zhang, Yujia; Xing, Eric P. (Journal article; Tidsskriftartikkel; Peer reviewed, 2018-12-14)
      Salient segmentation aims to segment out attention-grabbing regions, a critical yet challenging task and the foundation of many high-level computer vision applications. It requires semantic-aware grouping of pixels into salient regions and benefits from the utilization of global multi-scale contexts to achieve good local reasoning. Previous works often address it as two-class segmentation problems ...
    • Deep Reinforcement Learning for Query-Conditioned Video Summarization 

      Zhang, Yujia; Kampffmeyer, Michael C.; Zhao, Xiaoguang; Tan, Min (Journal article; Tidsskriftartikkel; Peer reviewed, 2019-02-21)
      Query-conditioned video summarization requires to (1) find a diverse set of video shots/frames that are representative for the whole video, and that (2) the selected shots/frames are related to a given query. Thus it can be tailored to different user interests leading to a better personalized summary and differs from the generic video summarization which only focuses on video content. Our work targets ...
    • Dilated temporal relational adversarial network for generic video summarization 

      Zhang, Yujia; Kampffmeyer, Michael C.; Liang, Xiaodan; Zhang, Dingwen; Tan, Min; Xing, Eric P. (Journal article; Tidsskriftartikkel; Peer reviewed, 2019-10-12)
      The large amount of videos popping up every day, make it more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. We propose a novel Dilated ...
    • PSAIR: A Neuro-Symbolic Approach to Zero-Shot Visual Grounding 

      Pan, Yi; Zhang, Yujia; Kampffmeyer, Michael Christian; Zhao, Xiaoguang (Chapter; Bokkapittel, 2024-09-09)
      Supervised methods for Visual Grounding often require costly annotations of paired sentences and images with ground truth boxes. Recent zero-shot approaches to visual grounding such as ReCLIP and ChatRef aim to avoid the need for costly annotation of paired sentences and images with ground truth boxes. However, these approaches leverage an inflexible detect-then-reasoning paradigm, which leads to a ...
    • Rethinking knowledge graph propagation for zero-shot learning 

      Kampffmeyer, Michael C.; Chen, Yinbo; Liang, Xiaodan; Wang, Hao; Zhang, Yujia; Xing, Eric P. (Journal article; Tidsskriftartikkel; Peer reviewed, 2019)
      Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, ...