Abstract
In social settings, people interact in close proximity. When analyzing such encounters from video, we are typically interested in distinguishing between a large number of different interactions. Here, we address training deformable part models (DPMs) for the detection of such interactions from video, in both space and time. When we consider a large number of interaction classes, we face two challenges. First, we need to distinguish between interactions that are visually more similar. Second, it becomes more difficult to obtain sufficient specific training examples for each interaction class. In this paper, we address both challenges and focus on the latter. Specifically, we introduce a method to train body part detectors from nonspecific images with pose information. Such resources are widely available. We introduce a training scheme and an adapted DPM formulation to allow for the inclusion of this auxiliary data. We perform cross-dataset experiments to evaluate the generalization performance of our method. We demonstrate that our method can still achieve decent performance, from as few as five training examples.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017) |
Subtitle of host publication | 30 May - 3 June 2017, Washington, DC, USA |
Editors | Randall Bilof |
Publisher | IEEE |
Pages | 538-543 |
ISBN (Electronic) | 978-1-5090-4023-0 |
DOIs | |
Publication status | Published - 2017 |
Keywords
- auxiliary image data
- interaction detection
- social setting
- deformable part models
- DPM
- pose information