Loading...
Please wait, while we are loading the content...
Similar Documents
Prosodic and Other Long-term Features for Speaker Diarization (2009)
| Content Provider | CiteSeerX |
|---|---|
| Author | Friedl, Gerald Vinyals, Oriol Huang, Yan Müller, Christian |
| Abstract | Abstract—Speaker diarization is defined as the task of determining “who spoke when ” given an audio track and no other prior knowledge of any kind. The following article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long-term features. First, we present a framework to study the speaker discriminability of 70 different long-term features. Then, we show how the top-ranked long-term features can be combined with short-term features to increase the accuracy of speaker diarization. The results were measured on standardized datasets (NIST RT) and show a consistent improvement of about 30 % relative in diarization error rate compared to the best system presented at the NIST evaluation in 2007. Index Terms—Long-term features, prosody, speaker diarization. I. |
| File Format | |
| Journal | IEEE Transactions on Audio, Speech, and Language Processing |
| Language | English |
| Publisher Date | 2009-01-01 |
| Access Restriction | Open |
| Subject Keyword | Speaker Diarization Long-term Feature Top-ranked Long-term Feature Standardized Datasets Different Long-term Feature Traditional Short-term Feature Short-term Feature Abstract Speaker Diarization State-of-the-art Speaker Diarization System Prior Knowledge Audio Track Consistent Improvement Speaker Discriminability Diarization Error Rate Nist Evaluation Index Term Long-term Feature Nist Rt Following Article |
| Content Type | Text |
| Resource Type | Article |