Loading...
Please wait, while we are loading the content...
Similar Documents
Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
Content Provider | MDPI |
---|---|
Author | Wu, Xiaoyu Wang, Tiantian Wang, Sheng Jin |
Copyright Year | 2020 |
Description | Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval. |
Starting Page | 2125 |
e-ISSN | 20799292 |
DOI | 10.3390/electronics9122125 |
Journal | Electronics |
Issue Number | 12 |
Volume Number | 9 |
Language | English |
Publisher | MDPI |
Publisher Date | 2020-12-11 |
Access Restriction | Open |
Subject Keyword | Electronics Artificial Intelligence Cross-model Learning Text-video Retrieval Semantic Correlation Multi-task Learning |
Content Type | Text |
Resource Type | Article |