Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

A.G. Stergiou, R.W. Poppe

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we introduce a novel convolution block for CNN architectures with video input. Our proposed Fractioned Adjacent Spatial and Temporal (FAST) 3D convolutions are a natural decomposition of a regular 3D convolution. Each convolution block consist of three sequential convolution operations: a 2D spatial convolution followed by spatio-temporal convolutions in the horizontal and vertical direction, respectively. Additionally, we introduce a FAST variant that treats horizontal and vertical motion in parallel. Experiments on benchmark action recognition datasets UCF-101 and HMDB-51 with ResNet architectures demonstrate consistent increased performance of FAST 3D convolution blocks over traditional 3D convolutions. The lower validation loss indicates better generalization, especially for deeper networks. We also evaluate the performance of CNN architectures with similar memory requirements, based either on Two-stream networks or with 3D convolution blocks. DenseNet-121 with FAST 3D convolutions was shown to perform best, giving further evidence of the merits of the decoupled spatio-temporal convolutions.
Original languageEnglish
Title of host publicationProceedings of the International Conference On Machine Learning And Applications (ICMLA)
PublisherIEEE
Pages183-190
ISBN (Electronic)978-1-7281-4550-1
ISBN (Print)978-1-7281-4551-8
DOIs
Publication statusPublished - 2019
EventIEEE International Conference on Machine Learning and Applications 2019 - Boca Raton, United States
Duration: 16 Dec 201919 Dec 2019
https://www.icmla-conference.org/icmla19/

Conference

ConferenceIEEE International Conference on Machine Learning and Applications 2019
Abbreviated titleICMLA
Country/TerritoryUnited States
CityBoca Raton
Period16/12/1919/12/19
Internet address

Keywords

  • 3D-Convolutions
  • action recognition
  • spatio-temporal convolutions
  • Three-dimensional display
  • Kernel
  • Convolutional codes
  • Two dimensional displays
  • Solid modeling
  • Benchmark testing
  • Optical imaging

Fingerprint

Dive into the research topics of 'Spatio-Temporal FAST 3D Convolutions for Human Action Recognition'. Together they form a unique fingerprint.

Cite this