TY - CONF
T1 - IBM research and Columbia University TRECVID-2013 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), Surveillance event detection (SED), and semantic indexing (SIN) systems
AU - Brown, Lisa
AU - Cao, Liangliang
AU - Chang, Shih Fu
AU - Cheng, Yu
AU - Choudhary, Alok
AU - Codella, Noel
AU - Cotton, Courtenay
AU - Ellis, Dan
AU - Fan, Quanfu
AU - Feris, Rogerio
AU - Gong, Leiguang
AU - Hill, Matthew
AU - Hua, Gang
AU - Kender, John
AU - Merler, Michele
AU - Mu, Yadong
AU - Pankanti, Sharath
AU - Smith, John R.
AU - Yu, Felix X.
N1 - Funding Information:
∗Columbia University, Dept. of Electrical Engineering †IBM T. J. Watson Research Center ‡Columbia University, Dept. of Computer Science §Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20070. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.
Publisher Copyright:
© 2013 TREC Video Retrieval Evaluation, TRECVID 2013. All rights reserved.
PY - 2013
Y1 - 2013
N2 - For this year’s TRECVID Multimedia Event Detection task [11], our team studied a semantic approach to video retrieval. We constructed a faceted taxonomy of 1313 visual concepts (including attributes and dynamic action concepts) and 85 audio concepts. Event search was performed via keyword search with a human user in-the-loop. Our submitted runs included Pre-Specified and Ad-Hoc event collections. For each collection, we submitted 3 exemplar conditions: 0, 10, and 100 exemplars. For each exemplar condition, we also submitted 3 types of semantic modality retrieval results: visual only, audio only, and combined. The current IBM-Columbia MER system exploits nine observations about human cognition, language, and visual perception in order to produce an effective video recounting of an event. It designed and tuned algorithms that both locate a minimal persuasive video segment, and script a minimal verbal collection of concepts, in order to convince an analyst that the MED decision was correct. With little loss of descriptive clarity. the system achieved the highest speed-up ratio amongst the ten teams competing in the NIST MER evaluation. For SED, we seek to explore temporal dependencies between events for enhancing both evaluation tasks, i.e automatic event detection (retrospective) and interactive event detection with human in the loop (interactive). Our retrospective system is based on a joint-segmentation-detection framework integrated with temporal event modeling while the interactive system performs risk analysis to guide the end user for effective verification. We achieve better results on the retrospective and interactive tasks than last year. For SIN, we submitted 4 full concept detection runs, and 2 concept pair runs. In the first 3 concept detection runs, we changed our data sampling strategy between using balanced bags via majority undersampling for ensemble fusion learning, balanced bags via minority oversampling, and unbalanced bags. For the 4th run we used a rank normalized fusion of the first 3 runs. Concept pair runs consisted of the sum of individual concept classifiers with and without sigmoid normalization of the dataset.
AB - For this year’s TRECVID Multimedia Event Detection task [11], our team studied a semantic approach to video retrieval. We constructed a faceted taxonomy of 1313 visual concepts (including attributes and dynamic action concepts) and 85 audio concepts. Event search was performed via keyword search with a human user in-the-loop. Our submitted runs included Pre-Specified and Ad-Hoc event collections. For each collection, we submitted 3 exemplar conditions: 0, 10, and 100 exemplars. For each exemplar condition, we also submitted 3 types of semantic modality retrieval results: visual only, audio only, and combined. The current IBM-Columbia MER system exploits nine observations about human cognition, language, and visual perception in order to produce an effective video recounting of an event. It designed and tuned algorithms that both locate a minimal persuasive video segment, and script a minimal verbal collection of concepts, in order to convince an analyst that the MED decision was correct. With little loss of descriptive clarity. the system achieved the highest speed-up ratio amongst the ten teams competing in the NIST MER evaluation. For SED, we seek to explore temporal dependencies between events for enhancing both evaluation tasks, i.e automatic event detection (retrospective) and interactive event detection with human in the loop (interactive). Our retrospective system is based on a joint-segmentation-detection framework integrated with temporal event modeling while the interactive system performs risk analysis to guide the end user for effective verification. We achieve better results on the retrospective and interactive tasks than last year. For SIN, we submitted 4 full concept detection runs, and 2 concept pair runs. In the first 3 concept detection runs, we changed our data sampling strategy between using balanced bags via majority undersampling for ensemble fusion learning, balanced bags via minority oversampling, and unbalanced bags. For the 4th run we used a rank normalized fusion of the first 3 runs. Concept pair runs consisted of the sum of individual concept classifiers with and without sigmoid normalization of the dataset.
UR - http://www.scopus.com/inward/record.url?scp=85085787325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085787325&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85085787325
T2 - 2013 TREC Video Retrieval Evaluation, TRECVID 2013
Y2 - 20 November 2013 through 22 November 2013
ER -