The introduction of the depth cameras has opened up a new way for spatio-temporal pattern classification by providing the depth map of a scene, but the unique characteristics of the depth maps also calls for novel spatio-temporal representations. The depth maps do not have as much texture as the conventional RGB images do, and they are much more noisy. When the depth maps are captured from just a single view, occlusion is another serious problem. In order to deal with these issues, we develop a semi-local feature called random occupancy pattern (ROP), which employs a novel progressive rejection sampling scheme to effectively explore an extremely large sampling space. We also utilize a sparse coding approach to robustly encode these features. The proposed approach does not require careful parameter tuning. Its training is very fast due to the use of the high-dimensional integral image and the efficient sampling scheme, and it is robust to the occlusions. Our technique is evaluated on three datasets captured by commodity depth cameras: an action dataset and a hand gesture dataset. Our classification results are comparable or superior to those obtained by the state-of-the-art approaches on all two datasets. The experiments also demonstrate the robustness of the proposed method to noises and occlusions.