Learning maximum margin temporal warping for action recognition

Jiang Wang, Ying Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

53 Scopus citations

Abstract

Temporal misalignment and duration variation in video actions largely influence the performance of action recognition, but it is very difficult to specify effective temporal alignment on action sequences. To address this challenge, this paper proposes a novel discriminative learning-based temporal alignment method, called maximum margin temporal warping (MMTW), to align two action sequences and measure their matching score. Based on the latent structure SVM formulation, the proposed MMTW method is able to learn a phantom action template to represent an action class for maximum discrimination against other classes. The recognition of this action class is based on the associated learned alignment of the input action. Extensive experiments on five benchmark datasets have demonstrated that this MMTW model is able to significantly promote the accuracy and robustness of action recognition under temporal misalignment and variations.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Conference on Computer Vision, ICCV 2013
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2688-2695
Number of pages8
ISBN (Print)9781479928392
DOIs
StatePublished - Jan 1 2013
Event2013 14th IEEE International Conference on Computer Vision, ICCV 2013 - Sydney, NSW, Australia
Duration: Dec 1 2013Dec 8 2013

Publication series

NameProceedings of the IEEE International Conference on Computer Vision

Other

Other2013 14th IEEE International Conference on Computer Vision, ICCV 2013
CountryAustralia
CitySydney, NSW
Period12/1/1312/8/13

Keywords

  • Action Recognition
  • Depth Camera
  • Dynamic Temporal Warpping
  • Temporal Model

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Learning maximum margin temporal warping for action recognition'. Together they form a unique fingerprint.

Cite this