Loading...
Please wait, while we are loading the content...
Similar Documents
Local spatiotemporal descriptors for visual recognition of spoken phrases
| Content Provider | CiteSeerX |
|---|---|
| Author | Zhao, Guoying Pietikäinen, Matti Hadid, Abdenour |
| Description | Proc. 2nd International Workshop on Human-Centered Multimedia (HCM2007), 2007 Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment. In this paper, we propose local spatiotemporal descriptors to represent and recognize spoken isolated phrases based solely on visual input. Positions of the eyes determined by a robust face and eye detector are used for localizing the mouth regions in face images. Spatiotemporal local binary patterns extracted from these regions are used for describing phrase sequences. In our experiments with 817 sequences from ten phrases and 20 speakers, promising accuracies of 62 % and 70% were obtained in speaker-independent and speaker-dependent recognition, respectively. In comparison with other methods on the Tulips1 audio-visual database, the accuracy 92.7 % of our method clearly outperforms the others. Advantages of our approach include local processing and robustness to monotonic gray-scale changes. Moreover, no error prone segmentation of moving lips is needed. |
| File Format | |
| Language | English |
| Access Restriction | Open |
| Subject Keyword | Speech Recognition Eye Detector Local Spatiotemporal Descriptor Face Image Important Role Ten Phrase Phrase Sequence Tulips1 Audio-visual Database Gray-scale Change Visual Input Speaker-dependent Recognition Mouth Region Robust Face Visual Speech Information Local Processing Noisy Condition Spatiotemporal Local Binary Pattern Spoken Phrase Visual Recognition Error Prone Segmentation |
| Content Type | Text |
| Resource Type | Article |