Variational capsule encoder

Harish RaviPrakash, Syed Muhammad Anwar, Ulas Bagci

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We propose a novel capsule network based variational encoder architecture, called Bayesian capsules (B-Caps), to modulate the mean and standard deviation of the sampling distribution in the latent space. We hypothesized that this approach can learn a better representation of features in the latent space than traditional approaches. Our hypothesis was tested by using the learned latent variables for image reconstruction task, where for MNIST and Fashion-MNIST datasets, different classes were separated successfully in the latent space using our proposed model. Our experimental results have shown improved reconstruction and classification performances for both datasets adding credence to our hypothesis. We also showed that by increasing the latent space dimension, the proposed B-Caps was able to learn a better representation when compared to the traditional variational auto-encoders (VAE). Hence our results indicate the strength of capsule networks in representation learning which has never been examined under the VAE settings before.

Original languageEnglish (US)
Title of host publicationProceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Electronic)9781728188089
StatePublished - 2020
Externally publishedYes
Event25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy
Duration: Jan 10 2021Jan 15 2021

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651


Conference25th International Conference on Pattern Recognition, ICPR 2020
CityVirtual, Milan


  • Capsule network
  • Data-driven sampling
  • Deep learning
  • VAE

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Variational capsule encoder'. Together they form a unique fingerprint.

Cite this