ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraakEnglish 
    • EnglishEnglish
    • norsknorsk
  • Administration/UB
View Item 
  •   Home
  • Fakultet for naturvitenskap og teknologi
  • Institutt for informatikk
  • Artikler, rapporter og annet (informatikk)
  • View Item
  •   Home
  • Fakultet for naturvitenskap og teknologi
  • Institutt for informatikk
  • Artikler, rapporter og annet (informatikk)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Semi-CNN architecture for effective spatio-temporal Learning in action recognition

Permanent link
https://hdl.handle.net/10037/18670
DOI
https://doi.org/10.3390/app10020557
Thumbnail
View/Open
article.pdf (4.232Mb)
Published version (PDF)
Date
2020-01-12
Type
Journal article
Tidsskriftartikkel
Peer reviewed

Author
Leong, Mei Chee; Prasad, Dilip K.; Lee, Yong Tsui; Lin, Feng
Abstract
This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal features in video action recognition. Unlike 2D convolutional neural networks (CNNs), 3D CNNs can be applied directly on consecutive frames to extract spatio-temporal features. The aim of this work is to fuse the convolution layers from 2D and 3D CNNs to allow temporal encoding with fewer parameters than 3D CNNs. We adopt transfer learning from pre-trained 2D CNNs for spatial extraction, followed by temporal encoding, before connecting to 3D convolution layers at the top of the architecture. We construct our fusion architecture, semi-CNN, based on three popular models: VGG-16, ResNets and DenseNets, and compare the performance with their corresponding 3D models. Our empirical results evaluated on the action recognition dataset UCF-101 demonstrate that our fusion of 1D, 2D and 3D convolutions outperforms its 3D model of the same depth, with fewer parameters and reduces overfitting. Our semi-CNN architecture achieved an average of 16–30% boost in the top-1 accuracy when evaluated on an input video of 16 frames.
Publisher
MDPI
Citation
Leong, Prasad DK, Lee, Lin F. Semi-CNN architecture for effective spatio-temporal Learning in action recognition. Applied Sciences. 2020;10(557)
Metadata
Show full item record
Collections
  • Artikler, rapporter og annet (informatikk) [478]
Copyright 2020 The Author(s)

Browse

Browse all of MuninCommunities & CollectionsAuthor listTitlesBy Issue DateBrowse this CollectionAuthor listTitlesBy Issue Date
Login

Statistics

View Usage Statistics
UiT

Munin is powered by DSpace

UiT The Arctic University of Norway
The University Library
uit.no/ub - munin@ub.uit.no

Accessibility statement (Norwegian only)