Loading...
Please wait, while we are loading the content...
Similar Documents
Optimal multimodal fusion for multimedia data analysis (2004)
| Content Provider | CiteSeerX |
|---|---|
| Author | Wu, Yi Chang, Edward Y. Chang, Kevin Chen-Chuan Smith, John R. |
| Description | Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modal-ities has been identified, how do we best fuse them to map to se-mantics? In this paper, we propose a two-step approach. The first step finds statistically independent modalities from raw features. In the second step, we use super-kernel fusion to determine the optimal combination of individual modalities. We carefully ana-lyze the tradeoffs between three design factors that affect fusion performance: modality independence, curse of dimensionality, and fusion-model complexity. Through analytical and empirical studies, we demonstrate that our two-step approach, which achieves a care-ful balance of the three design factors, can improve class-prediction accuracy over traditional techniques. In ACM Multimedia |
| File Format | |
| Language | English |
| Publisher Date | 2004-01-01 |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |