Loading...
Please wait, while we are loading the content...
Similar Documents
Bbn viser trecvid 2012 multimedia event detection and multimedia event recounting systems.
| Content Provider | CiteSeerX |
|---|---|
| Author | Natarajan, Pradeep Natarajan, Prem Wu, Shuang Zhuang, Xiaodan Vazquez-Reina, Amelio Vitaladevuni, Shiv N. Andersen, Carl Prasad, Rohit Ye, Guangnan Liu, Dong Chang, Shih-Fu Saleemi, Imran Shah, Mubarak Ng, Yue White, Yn Davis, Larry Gupta, Abhinav Haritaoglu, Ismail |
| Abstract | We describe the Raytheon BBN Technologies (BBN) led VISER system for the TRECVID 2012 Multimedia Event Detection (MED) and Recounting (MER) tasks. We present a comprehensive analysis of the different modules in our evaluation system that includes: (1) a large suite of visual, audio and multimodal low-level features, (2) modules to detect semantic scene/action/object concepts over the entire video and within short temporal spans, (3) automatic speech recognition (ASR), and (4) videotext detection and recognition (OCR). For the low-level features we used multiple static, motion, color, and audio features previously considered in literature as well as a set of novel, fast kernel based feature descriptors developed recently by BBN. For the semantic concept detection systems, we leveraged BBN's natural language processing (NLP) technologies to automatically analyze and identify salient concepts from short textual descriptions of videos and frames. Then, we trained detectors for these concepts using visual and audio features. The semantic concept based systems enable rich description of video content for event recounting (MER). The video level concepts have the most coverage and can provide robust concept detections on most videos. Segment level concepts are less robust, but can provide sequence information that enriches recounting. Object detection, ASR and OCR are sporadic in occurrence but have high precision and improves quality of the recounting. For the MED task, we combined these different streams using multiple early/feature level and late/score level fusion strategies. We present a rigorous |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Multimedia Event Detection Multimedia Event Bbn Viser Trecvid Audio Feature Semantic Concept Semantic Scene Action Object Concept Multimodal Low-level Feature Comprehensive Analysis Med Task Video Content Automatic Speech Recognition System Enable Rich Description Raytheon Bbn Technology Natural Language Processing Evaluation System Robust Concept Detection Low-level Feature Viser System Video Level Concept Salient Concept Different Stream Large Suite Feature Descriptor Sequence Information High Precision Segment Level Concept Event Recounting Different Module Multiple Early Feature Level Late Score Level Fusion Strategy Videotext Detection Short Temporal Span Short Textual Description Semantic Concept Detection System Object Detection Entire Video |
| Content Type | Text |