Loading...
Please wait, while we are loading the content...
Estimation of Speech Intelligibility Using Perceptual Speech Quality Scores
| Content Provider | Semantic Scholar |
|---|---|
| Author | Kondo, Kazuhiro |
| Copyright Year | 2018 |
| Abstract | Recent advances in mobile wireless communication devices have made possible speech communication in a variety of noise environments which were not possible before. Also, sophisticated speech encoders, echo control devices, and noise canceling devices have caused artificial synthetic noise, e.g. musical noise, which were not seen before with analog or simple PCM speech communication. Thus, a need for comprehensive speech communication quality measures and frequent evaluation efforts have become a necessity. Speech quality is generally measured in one of two measures. The overall listening quality, such as the “naturalness” of the test speech, is typically measured as the Mean Opinion Score (MOS) (ITU-T, 1996). The other criteria is speech intelligibility, which tries to measure the accuracy with which the test speech material carries its spoken content. We will deal mainly with the latter measure in this chapter. There were not many variations in the types of degradations seen in conventional speech communication systems. Common types of degradations seen were simple ones, such as band limitation and additive noise. Thus, evaluation procedures were fairly simple. Traditionally, Japanese intelligibility tests often used stimuli of randomly selected single mora, two morae or three morae speech (Iida, 1987). The subjects were free to choose from any combination of valid Japanese syllables. This quickly became a strenuous task as the channel distortion increases. Thus, intelligibility tests of this kind is known to be unstable and often do not reflect the physically evident distortion, giving surprising results (Nishimura et al., 1996). English intelligibility tests are also reported to show similar trends. Accordingly, the Diagnostic Rhyme Test (DRT) (Voiers, 1977; 1983), a closed set selection test that restricted the reply to two words, was proposed. This test is said to be effective in controlling various factors including the amount of training and phonetic context, and is known to give stable intelligibility scores. The DRT has now become an ANSI standard (ANSI, 1989). In this chapter, we will briefly describe a DRT-type closed set selection test in Japanese (Kondo et al., 2007; 2001). We categorized Japanese consonants into the same taxonomy used for the English tests, and proposed a minimum-pair list accordingly which differ only by the initial consonant and by a single phonetic feature. Subjective test results are also shown with various noise under various SNR. Then, we will investigate on methods to estimate intelligibility through objective measures. If this is possible with reasonable accuracy, we should be able to “screen” the intelligibility in many of the conditions, and limit the need for full-scale subjective test to a minimum subset. 8 |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://cdn.intechopen.com/pdfs/15024.pdf |
| Alternate Webpage(s) | https://api.intechopen.com/chapter/pdf-download/15024 |
| Language | English |
| Access Restriction | Open |
| Subject Keyword | Additive white Gaussian noise Analog Bandlimiting Categorization Control theory Distortion Encoder Full scale Intelligibility (philosophy) Mobile phone Noise-induced hearing loss Protein-Energy Malnutrition Randomness Signal-to-noise ratio Subgroup Syllable Synthetic intelligence Taxonomy Unstable Medical Device Problem Utility functions on indivisible goods quality measures |
| Content Type | Text |
| Resource Type | Article |