Class Feature Pyramids for Video Explanation

A.G. Stergiou, G. Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, R.W. Poppe, R.C. Veltkamp

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of the trained models are more difficult to interpret. We focus on creating human-understandable visual explanations that represent the hierarchical parts of spatio-temporal networks. We introduce Class Feature Pyramids, a method that traverses the entire network structure and incrementally discovers kernels at different network depths that are informative for a specific class. Our method does not depend on the network's architecture or the type of 3D convolutions, supporting grouped and depth-wise convolutions, convolutions in fibers, and convolutions in branches. We demonstrate the method on six state-of-the-art 3D convolution neural networks (CNNs) on three action recognition (Kinetics-400, UCF-101, and HMDB-51) and two egocentric action recognition datasets (EPIC-Kitchens and EGTEA Gaze+).
Original languageEnglish
Title of host publicationProceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
PublisherIEEE
Pages4255-4264
ISBN (Electronic)978-1-7281-5023-9
ISBN (Print)978-1-7281-5024-6
DOIs
Publication statusPublished - 2019
EventIEEE International Conference on Computer Vision Workshops 2019 - Seoul, Korea, Republic of
Duration: 27 Oct 20192 Nov 2019

Workshop

WorkshopIEEE International Conference on Computer Vision Workshops 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/192/11/19

Keywords

  • Visual Explanations
  • Explainable Convolutions
  • Spatio-temporal feature representation
  • Feature extraction
  • Kernel
  • Visualization
  • Convolutional codes
  • Three-dimensional displays
  • Biological neural networks
  • Complexity theory

Fingerprint

Dive into the research topics of 'Class Feature Pyramids for Video Explanation'. Together they form a unique fingerprint.

Cite this