Temporal capsule networks for video motion estimation and error concealment

Arun Sankisa*, Arjun Punjabi, Aggelos K. Katsaggelos

*Corresponding author for this work

Research output: Contribution to journalArticle


In this paper, we present a temporal capsule network architecture to encode motion in videos as an instantiation parameter. The extracted motion is used to perform motion-compensated error concealment. We modify the original architecture and use a carefully curated dataset to enable the training of capsules spatially and temporally. First, we add the temporal dimension by taking co-located “patches” from three consecutive frames obtained from standard video sequences to form input data “cubes.” Second, the network is designed with an initial feature extraction layer that operates on all three dimensions to generate spatiotemporal features. Additionally, we implement the PrimaryCaps module with a recurrent layer, instead of a conventional convolutional layer, to extract short-term motion-related temporal dependencies and encode them as activation vectors in the capsule output. Finally, the capsule output is combined with the most-recent past frame and passed through a fully connected reconstruction network to perform motion-compensated error concealment. We study the effectiveness of temporal capsules by comparing the proposed model with architectures that do not include capsules. Although the quality of the reconstruction shows room for improvement, we successfully demonstrate that capsules-based architectures can be designed to operate in the temporal dimension to encode motion-related attributes as instantiation parameters. The accuracy of motion estimation is evaluated by comparing both the reconstructed frame outputs and the corresponding optical flow estimates with ground truth data.

Original languageEnglish (US)
Pages (from-to)1369-1377
Number of pages9
JournalSignal, Image and Video Processing
Issue number7
StatePublished - Oct 1 2020


  • Capsule networks
  • Conv3D
  • ConvLSTM
  • Error concealment
  • Motion estimation

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Temporal capsule networks for video motion estimation and error concealment'. Together they form a unique fingerprint.

  • Cite this