PTUS: Photo-Realistic Talking Upper-Body Synthesis via 3D-Aware Motion Decomposition Warping

Lin, Luoyang; Jiang, Zutao; Liang, Xiaodan; Ma, Liqian; Kampffmeyer, Michael Christian; Cao, Xiaochun

dc.contributor.author	Lin, Luoyang
dc.contributor.author	Jiang, Zutao
dc.contributor.author	Liang, Xiaodan
dc.contributor.author	Ma, Liqian
dc.contributor.author	Kampffmeyer, Michael Christian
dc.contributor.author	Cao, Xiaochun
dc.date.accessioned	2025-03-17T10:00:41Z
dc.date.available	2025-03-17T10:00:41Z
dc.date.issued	2024-03-24
dc.description.abstract	Talking upper-body synthesis is a promising task due to its versatile potential for video creation and consists of animating the body and face from a source image with the motion from a given driving video. However, prior synthesis approaches fall short in addressing this task and have been either limited to animating heads of a target person only, or have animated the upper body but neglected the synthesis of precise facial details. To tackle this task, we propose a Photo-realistic Talking Upper-body Synthesis method via 3D-aware motion decomposition warping, named PTUS, to both precisely synthesize the upper body as well as recover the details of the face such as blinking and lip synchronization. In particular, the motion decomposition mechanism consists of a face-body motion decomposition, which decouples the 3D motion estimation of the face and body, and a local-global motion decomposition, which decomposes the 3D face motion into global and local motions resulting in the transfer of facial expression. The 3D-aware warping module transfers the large-scale and subtle 3D motions to the extracted 3D depth-aware features in a coarse-tofine manner. Moreover, we present a new dataset, Talking-UB, which includes upper-body images with high-resolution faces, addressing the limitations of prior datasets that either consist of only facial images or upper-body images with blurry faces. Experimental results demonstrate that our proposed method can synthesize high-quality videos that preserve facial details, and achieves superior results compared to state-of-the-art cross-person motion transfer approaches. Code and collected dataset are released in https://github.com/cooluoluo/PTUS.	en_US
dc.identifier.citation	Lin, Jiang, Liang, Ma, Kampffmeyer, Cao. PTUS: Photo-Realistic Talking Upper-Body Synthesis via 3D-Aware Motion Decomposition Warping. Proceedings of the AAAI Conference on Artificial Intelligence. 2024;38(4)	en_US
dc.identifier.cristinID	FRIDAID 2296322
dc.identifier.doi	10.1609/aaai.v38i4.28131
dc.identifier.issn	2159-5399
dc.identifier.issn	2374-3468
dc.identifier.uri	https://hdl.handle.net/10037/36705
dc.language.iso	eng	en_US
dc.publisher	Association for the Advancement of Artificial Intelligence	en_US
dc.relation.journal	Proceedings of the AAAI Conference on Artificial Intelligence
dc.relation.projectID	Norges forskningsråd: 309439	en_US
dc.rights.accessRights	openAccess	en_US
dc.rights.holder	Copyright 2024 The Author(s)	en_US
dc.title	PTUS: Photo-Realistic Talking Upper-Body Synthesis via 3D-Aware Motion Decomposition Warping	en_US
dc.type.version	acceptedVersion	en_US
dc.type	Journal article	en_US
dc.type	Tidsskriftartikkel	en_US
dc.type	Peer reviewed	en_US

Tilhørende fil(er)

Navn:: article.pdf
Størrelse:: 7.084Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Artikler, rapporter og annet (fysikk og teknologi) [1062]

Vis enkel innførsel