Learn to cycle: Time-consistent feature discovery for action recognition

A.G. Stergiou, R.W. Poppe

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs with similar activations with potential temporal variations. We implement this idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics, in conjunction with a temporal gate that is responsible for evaluating the consistency of the discovered dynamics and the modeled features. We show consistent improvement when using SRTG blocks, with only a minimal increase in the number of GFLOPs. On Kinetics-700, we perform on par with current state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101 and HMDB-51. 1

Original languageEnglish
Pages (from-to)1-7
Number of pages7
JournalPattern Recognition Letters
Volume141
DOIs
Publication statusPublished - Jan 2021

Bibliographical note

Funding Information:
This publication is supported by the Netherlands Organization for Scientific Research (NWO) with a TOP-C2 grant for “Automatic recognition of bodily interactions” (ARBITER).

Funding Information:
This publication is supported by the Netherlands Organization for Scientific Research (NWO) with a TOP-C2 grant for ?Automatic recognition of bodily interactions? (ARBITER).

Publisher Copyright:
© 2020

Keywords

  • 3D-CNNs
  • Action recognition
  • Spatio-temporal CNNs
  • Squeeze and recursion
  • Temporal cyclic error
  • Temporal gates

Fingerprint

Dive into the research topics of 'Learn to cycle: Time-consistent feature discovery for action recognition'. Together they form a unique fingerprint.

Cite this