TY - GEN
T1 - Mining actionlet ensemble for action recognition with depth cameras
AU - Wang, Jiang
AU - Liu, Zicheng
AU - Wu, Ying
AU - Yuan, Junsong
PY - 2012
Y1 - 2012
N2 - Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
AB - Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84866672692&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866672692&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2012.6247813
DO - 10.1109/CVPR.2012.6247813
M3 - Conference contribution
AN - SCOPUS:84866672692
SN - 9781467312264
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 1290
EP - 1297
BT - 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
T2 - 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
Y2 - 16 June 2012 through 21 June 2012
ER -