Dilated temporal relational adversarial network for generic video summarization

Zhang, Yujia; Kampffmeyer, Michael C.; Liang, Xiaodan; Zhang, Dingwen; Tan, Min; Xing, Eric P.

dc.contributor.author	Zhang, Yujia
dc.contributor.author	Kampffmeyer, Michael C.
dc.contributor.author	Liang, Xiaodan
dc.contributor.author	Zhang, Dingwen
dc.contributor.author	Tan, Min
dc.contributor.author	Xing, Eric P.
dc.date.accessioned	2020-03-05T06:44:05Z
dc.date.available	2020-03-05T06:44:05Z
dc.date.issued	2019-10-12
dc.description.abstract	The large amount of videos popping up every day, make it more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. We propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it selects the set of key frames, which contain the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced to enhance temporal representation capturing. The generator uses this unit to effectively exploit global multi-scale temporal context to select key frames and to complement the commonly used Bi-LSTM. To ensure that summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on three public datasets show the effectiveness of the proposed approach.	en_US
dc.description	This is a post-peer-review, pre-copyedit version of an article published in Multimedia Tools and Applications. The final authenticated version is available online at: http://dx.doi.org/<a href=https://doi.org/10.1007/s11042-019-08175-y>https://doi.org/10.1007/s11042-019-08175-y</a>.	en_US
dc.identifier.citation	Zhang Y, Kampffmeyer MC, Liang X, Zhang, Tan M, Xing EP. Dilated temporal relational adversarial network for generic video summarization. Multimedia tools and applications. 2019;78(24):35237-35261	en_US
dc.identifier.cristinID	FRIDAID 1749907
dc.identifier.doi	10.1007/s11042-019-08175-y
dc.identifier.issn	1380-7501
dc.identifier.issn	1573-7721
dc.identifier.uri	https://hdl.handle.net/10037/17624
dc.language.iso	eng	en_US
dc.publisher	Springer Nature	en_US
dc.relation.journal	Multimedia tools and applications
dc.relation.projectID	Norges forskningsråd: 239844	en_US
dc.relation.projectID	info:eu-repo/grantAgreement/RCN/IKTPLUSS/239844/Norway/Next Generation Kernel-Based Machine Learning for Big Missing Data Applied to Earth Observation//	en_US
dc.rights.accessRights	openAccess	en_US
dc.rights.holder	Copyright © 2019, Springer Nature	en_US
dc.subject	VDP::Technology: 500	en_US
dc.subject	VDP::Teknologi: 500	en_US
dc.title	Dilated temporal relational adversarial network for generic video summarization	en_US
dc.type.version	acceptedVersion	en_US
dc.type	Journal article	en_US
dc.type	Tidsskriftartikkel	en_US
dc.type	Peer reviewed	en_US

Tilhørende fil(er)

Navn:: article.pdf
Størrelse:: 12.57Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Artikler, rapporter og annet (fysikk og teknologi) [1062]

Vis enkel innførsel