ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraakEnglish 
    • EnglishEnglish
    • norsknorsk
  • Administration/UB
View Item 
  •   Home
  • Fakultet for naturvitenskap og teknologi
  • Institutt for fysikk og teknologi
  • Artikler, rapporter og annet (fysikk og teknologi)
  • View Item
  •   Home
  • Fakultet for naturvitenskap og teknologi
  • Institutt for fysikk og teknologi
  • Artikler, rapporter og annet (fysikk og teknologi)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

Permanent link
https://hdl.handle.net/10037/32953
DOI
https://doi.org/10.1109/ICCV51070.2023.00803
Thumbnail
View/Open
article.pdf (28.08Mb)
Accepted manuscript version (PDF)
Date
2024-01-15
Type
Journal article
Tidsskriftartikkel
Peer reviewed

Author
Li, Haoyuan; Dong, Haoye; Jia, Hanchao; Huang, Dong; Kampffmeyer, Michael Christian; Lin, Liang; Liang, Xiaodan
Abstract
Multi-person 3D mesh recovery from videos is a critical first step towards automatic perception of group behavior in virtual reality, physical therapy and beyond. However, existing approaches rely on multi-stage paradigms, where the person detection and tracking stages are performed in a multi-person setting, while temporal dynamics are only modeled for one person at a time. Consequently, their performance is severely limited by the lack of inter-person interactions in the spatial-temporal mesh recovery, as well as by detection and tracking defects. To address these challenges, we propose the Coordinate transFormer (Coord-Former) that directly models multi-person spatial-temporal relations and simultaneously performs multi-mesh recovery in an end-to-end manner Instead of partitioning the feature map into coarse-scale patch-wise tokens, CoordFormer leverages a novel Coordinate-Aware Attention to preserve pixel-level spatial-temporal coordinate information. Additionally, we propose a simple, yet effective Body Center Attention mechanism to fuse position information. Extensive experiments on the 3DPW dataset demonstrate that CoordFormer significantly improves the state-of-the-art, outperforming the previously best results by 4.2%, 8.8% and 4.7% according to the MPJPE, PAMPJPE, and PVE metrics, respectively, while being 40% faster than recent video-based approaches. The released code can be found at https://github.com/Li-Hao-yuan/CoordFormer.
Publisher
IEEE
Citation
Li H, Dong H, Jia, Huang D, Kampffmeyer MC, Lin L, Liang X. Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos. IEEE International Conference on Computer Vision (ICCV). 2023
Metadata
Show full item record
Collections
  • Artikler, rapporter og annet (fysikk og teknologi) [1057]
Copyright 2023 The Author(s)

Browse

Browse all of MuninCommunities & CollectionsAuthor listTitlesBy Issue DateBrowse this CollectionAuthor listTitlesBy Issue Date
Login

Statistics

View Usage Statistics
UiT

Munin is powered by DSpace

UiT The Arctic University of Norway
The University Library
uit.no/ub - munin@ub.uit.no

Accessibility statement (Norwegian only)