Loading...
Please wait, while we are loading the content...
A deep architecture with bilinear modeling of hidden representations: applications to phonetic recognition
| Content Provider | Microsoft Research |
|---|---|
| Author | Hutchinson, Brian Deng, Li Yu, Dong |
| Copyright Year | 2012 |
| Abstract | We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden repre- sentations to the output in each block is used to incorporate higher- order statistics of the input features. A learning algorithm for the T-DSN is presented, in which the main parameter estimation bur- den is shifted to a convex sub-problem with a closed-form solution. Using an ef?cient and scalable parallel implementation, we train a T-DSN to discriminate standard three-state monophones in the TIMIT database. The T-DSN outperforms an alternative pretrained Deep Neural Network (DNN) architecture in frame-level classi?ca- tion (both state and phone) and in the cross-entropy measure. For continuous phonetic recognition, T-DSN performs equivalently to a DNN but without the need for a hard-to-scale, sequential ?ne-tuning step. |
| Language | English |
| Publisher | ICASSP 2012 IEEE SPS |
| Publisher Date | 2012-03-01 |
| Access Restriction | Open |
| Rights Holder | Microsoft Corporation |
| Subject Keyword | Human-computer interaction |
| Content Type | Text |
| Resource Type | Proceeding |