Loading...
Please wait, while we are loading the content...
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence
| Content Provider | MDPI |
|---|---|
| Author | Nguyen, Huy Miyazaki, Tomo Sugaya, Yoshihiro Omachi, Shinichiro |
| Copyright Year | 2021 |
| Description | Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods. |
| Starting Page | 3214 |
| e-ISSN | 20763417 |
| DOI | 10.3390/app11073214 |
| Journal | Applied Sciences |
| Issue Number | 7 |
| Volume Number | 11 |
| Language | English |
| Publisher | MDPI |
| Publisher Date | 2021-04-03 |
| Access Restriction | Open |
| Subject Keyword | Applied Sciences Artificial Intelligence Video Retrieval Visual-semantic Embedding Multiple Embedding Spaces |
| Content Type | Text |
| Resource Type | Article |