Semi-CNN architecture for effective spatio-temporal Learning in action recognition

Leong, Mei Chee; Prasad, Dilip K.; Lee, Yong Tsui; Lin, Feng

dc.contributor.author	Leong, Mei Chee
dc.contributor.author	Prasad, Dilip K.
dc.contributor.author	Lee, Yong Tsui
dc.contributor.author	Lin, Feng
dc.date.accessioned	2020-06-26T10:43:37Z
dc.date.available	2020-06-26T10:43:37Z
dc.date.issued	2020-01-12
dc.description.abstract	This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal features in video action recognition. Unlike 2D convolutional neural networks (CNNs), 3D CNNs can be applied directly on consecutive frames to extract spatio-temporal features. The aim of this work is to fuse the convolution layers from 2D and 3D CNNs to allow temporal encoding with fewer parameters than 3D CNNs. We adopt transfer learning from pre-trained 2D CNNs for spatial extraction, followed by temporal encoding, before connecting to 3D convolution layers at the top of the architecture. We construct our fusion architecture, semi-CNN, based on three popular models: VGG-16, ResNets and DenseNets, and compare the performance with their corresponding 3D models. Our empirical results evaluated on the action recognition dataset UCF-101 demonstrate that our fusion of 1D, 2D and 3D convolutions outperforms its 3D model of the same depth, with fewer parameters and reduces overfitting. Our semi-CNN architecture achieved an average of 16–30% boost in the top-1 accuracy when evaluated on an input video of 16 frames.	en_US
dc.identifier.citation	Leong, Prasad DK, Lee, Lin F. Semi-CNN architecture for effective spatio-temporal Learning in action recognition. Applied Sciences. 2020;10(557)	en_US
dc.identifier.cristinID	FRIDAID 1815560
dc.identifier.doi	10.3390/app10020557
dc.identifier.issn	2076-3417
dc.identifier.uri	https://hdl.handle.net/10037/18670
dc.language.iso	eng	en_US
dc.publisher	MDPI	en_US
dc.relation.journal	Applied Sciences
dc.rights.accessRights	openAccess	en_US
dc.rights.holder	Copyright 2020 The Author(s)	en_US
dc.subject	VDP::Technology: 500::Information and communication technology: 550	en_US
dc.subject	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.title	Semi-CNN architecture for effective spatio-temporal Learning in action recognition	en_US
dc.type.version	publishedVersion	en_US
dc.type	Journal article	en_US
dc.type	Tidsskriftartikkel	en_US
dc.type	Peer reviewed	en_US

File(s) in this item

Name:: article.pdf
Size:: 4.232Mb
Format:: PDF

View/Open

This item appears in the following collection(s)

Artikler, rapporter og annet (informatikk) [389]

Show simple item record