IBM research and Columbia University TRECVID-2013 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), Surveillance event detection (SED), and semantic indexing (SIN) systems

Lisa Brown, Liangliang Cao, Shih Fu Chang, Yu Cheng, Alok Choudhary, Noel Codella, Courtenay Cotton, Dan Ellis, Quanfu Fan, Rogerio Feris, Leiguang Gong, Matthew Hill, Gang Hua, John Kender, Michele Merler, Yadong Mu, Sharath Pankanti, John R. Smith, Felix X. Yu

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

For this year’s TRECVID Multimedia Event Detection task [11], our team studied a semantic approach to video retrieval. We constructed a faceted taxonomy of 1313 visual concepts (including attributes and dynamic action concepts) and 85 audio concepts. Event search was performed via keyword search with a human user in-the-loop. Our submitted runs included Pre-Specified and Ad-Hoc event collections. For each collection, we submitted 3 exemplar conditions: 0, 10, and 100 exemplars. For each exemplar condition, we also submitted 3 types of semantic modality retrieval results: visual only, audio only, and combined. The current IBM-Columbia MER system exploits nine observations about human cognition, language, and visual perception in order to produce an effective video recounting of an event. It designed and tuned algorithms that both locate a minimal persuasive video segment, and script a minimal verbal collection of concepts, in order to convince an analyst that the MED decision was correct. With little loss of descriptive clarity. the system achieved the highest speed-up ratio amongst the ten teams competing in the NIST MER evaluation. For SED, we seek to explore temporal dependencies between events for enhancing both evaluation tasks, i.e automatic event detection (retrospective) and interactive event detection with human in the loop (interactive). Our retrospective system is based on a joint-segmentation-detection framework integrated with temporal event modeling while the interactive system performs risk analysis to guide the end user for effective verification. We achieve better results on the retrospective and interactive tasks than last year. For SIN, we submitted 4 full concept detection runs, and 2 concept pair runs. In the first 3 concept detection runs, we changed our data sampling strategy between using balanced bags via majority undersampling for ensemble fusion learning, balanced bags via minority oversampling, and unbalanced bags. For the 4th run we used a rank normalized fusion of the first 3 runs. Concept pair runs consisted of the sum of individual concept classifiers with and without sigmoid normalization of the dataset.

Original languageEnglish (US)
StatePublished - 2013
Event2013 TREC Video Retrieval Evaluation, TRECVID 2013 - Gaithersburg, United States
Duration: Nov 20 2013Nov 22 2013

Conference

Conference2013 TREC Video Retrieval Evaluation, TRECVID 2013
CountryUnited States
CityGaithersburg
Period11/20/1311/22/13

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'IBM research and Columbia University TRECVID-2013 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), Surveillance event detection (SED), and semantic indexing (SIN) systems'. Together they form a unique fingerprint.

Cite this