Loading...
Please wait, while we are loading the content...
Similar Documents
MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES
| Content Provider | CiteSeerX |
|---|---|
| Author | Hung, Hayley Friedl, Gerald Yeo, Chuohao |
| Abstract | Speaker diarization is originally defined as the task of de-termining “who spoke when ” given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a state-of-the-art speaker diarization system by combining standard acoustic features (MFCCs) with compressed domain video features. The approach is evaluated on over 4.5 hours of the publicly available AMI meetings dataset which contains challenges such as people standing up and walking out of the room. We show a consistent improvement of about 34 % rela-tive in speaker error rate (21 % DER) compared to a state-of-the-art audio-only baseline. Index Terms — Speaker extraction, multi-modal, com-pressed domain features |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | State-of-the-art Audio-only Baseline Standard Acoustic Feature Available Ami Meeting Multi-modal Approach State-of-the-art Speaker Diarization System Speaker Error Rate Consistent Improvement Com-pressed Domain Feature Multi-modal Speaker Diarization Following Article Speaker Diarization Index Term Speaker Extraction Prior Knowledge Domain Video Feature Audio Track |
| Content Type | Text |
| Resource Type | Article |