Loading...
Please wait, while we are loading the content...
Similar Documents
1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs
| Content Provider | Microsoft Research |
|---|---|
| Author | Seide, Frank Fu, Hao Droppo, Jasha Li, Gang Yu, Dong |
| Copyright Year | 2014 |
| Abstract | We show empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressivelyto but one bit per valueif the quantization error is carried forward across minibatches (error feedback). This size reduction makes it feasible to parallelize SGD through data-parallelism with fast processors like recent GPUs. |
| Language | English |
| Publisher | Interspeech 2014 |
| Publisher Date | 2014-09-01 |
| Access Restriction | Open |
| Rights Holder | Microsoft Corporation |
| Subject Keyword | Speech recognition Synthesis Dialogue systems |
| Content Type | Text |
| Resource Type | Proceeding |