TY - GEN
T1 - Discovery of collocation patterns
T2 - 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07
AU - Junsong, Yuan
AU - Ying, Wu
AU - Ming, Yang
PY - 2007
Y1 - 2007
N2 - A visual word lexicon can be constructed by clustering primitive visual features, and a visual object can be described by a set of visual words. Such a "bag-of-words" representation has led to many significant results in various vision tasks including object recognition and categorization. However, in practice, the clustering of primitive visual features tends to result in synonymous visual words that over-represent visual patterns, as well as polysemous visual words that bring large uncertainties and ambiguities in the representation. This paper aims at generating a higher-level lexicon, i.e. visual phrase lexicon, where a visual phrase is a meaningful spatially co-occurrent pattern of visual words. This higher-level lexicon is much less ambiguous than the lower-level one. The contributions of this paper include: (1) a fast and principled solution to the discovery of significant spatial co-occurrent patterns using frequent itemset mining; (2) a pattern summarization method that deals with the compositional uncertainties in visual phrases; and (3) a top-down refinement scheme of the visual word lexicon by feeding back discovered phrases to tune the similarity measure through metric learning.
AB - A visual word lexicon can be constructed by clustering primitive visual features, and a visual object can be described by a set of visual words. Such a "bag-of-words" representation has led to many significant results in various vision tasks including object recognition and categorization. However, in practice, the clustering of primitive visual features tends to result in synonymous visual words that over-represent visual patterns, as well as polysemous visual words that bring large uncertainties and ambiguities in the representation. This paper aims at generating a higher-level lexicon, i.e. visual phrase lexicon, where a visual phrase is a meaningful spatially co-occurrent pattern of visual words. This higher-level lexicon is much less ambiguous than the lower-level one. The contributions of this paper include: (1) a fast and principled solution to the discovery of significant spatial co-occurrent patterns using frequent itemset mining; (2) a pattern summarization method that deals with the compositional uncertainties in visual phrases; and (3) a top-down refinement scheme of the visual word lexicon by feeding back discovered phrases to tune the similarity measure through metric learning.
UR - http://www.scopus.com/inward/record.url?scp=34948876367&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34948876367&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2007.383222
DO - 10.1109/CVPR.2007.383222
M3 - Conference contribution
AN - SCOPUS:34948876367
SN - 1424411807
SN - 9781424411801
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
BT - 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07
Y2 - 17 June 2007 through 22 June 2007
ER -