Towards a Unified Compositional Model for Visual Pattern Modeling

Wei Tang, Pei Yu, Zhou Jianhuan, Ying Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations


Compositional models represent visual patterns as hierarchies of meaningful and reusable parts. They are attractive to vision modeling due to their ability to decompose complex patterns into simpler ones and resolve the lowlevel ambiguities in high-level image interpretations. However, current compositional models separate structure and part discovery from parameter estimation, which generally leads to suboptimal learning and fitting of the model. Moreover, the commonly adopted latent structural learning is not scalable for deep architectures. To address these difficult issues for compositional models, this paper quests for a unified framework for compositional pattern modeling, inference and learning. Represented by And-or graphs (AOGs), it jointly models the compositional structure, parts, features, and composition/sub-configuration relationships. We show that the inference algorithm of the proposed framework is equivalent to a feed-forward network. Thus, all the parameters can be learned efficiently via the highlyscalable back-propagation (BP) in an end-to-end fashion. We validate the model via the task of handwritten digit recognition. By visualizing the processes of bottom-up composition and top-down parsing, we show that our model is fully interpretable, being able to learn the hierarchical compositions from visual primitives to visual patterns at increasingly higher levels. We apply this new compositional model to natural scene character recognition and generic object detection. Experimental results have demonstrated its effectiveness.
Original languageEnglish (US)
Title of host publicationProceedings of the IEEE International Conference on Computer Vision
Number of pages10
ISBN (Print)978-1538610329
StatePublished - 2017


Dive into the research topics of 'Towards a Unified Compositional Model for Visual Pattern Modeling'. Together they form a unique fingerprint.

Cite this