Learning actionlet ensemble for 3D human action recognition

Jiang Wang, Zicheng Liu, Ying Wu, Junsong Yuan

Research output: Contribution to journalArticlepeer-review

258 Scopus citations


Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations, and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this paper, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state-of-the-art algorithms.

Original languageEnglish (US)
Article number6626306
Pages (from-to)914-927
Number of pages14
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Issue number5
StatePublished - May 2014


  • Computer vision
  • Gesture
  • Video analysis

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Learning actionlet ensemble for 3D human action recognition'. Together they form a unique fingerprint.

Cite this