Loading...
Please wait, while we are loading the content...
Automatic Building of Synthetic Voices from Audio Books
| Content Provider | Semantic Scholar |
|---|---|
| Author | Prahallad, Kishore |
| Copyright Year | 2010 |
| Abstract | Current state-of-the-art text-to-speech systems produce intelligible speech but lack the prosody of natural utterances. Building better models of prosody involves development of prosodically rich speech databases. However, development of such speech databases requires a large amount of effort and time. An alternative is to exploit story style monologues (long speech files) in audio books. These monologues already encapsulate rich prosody including varied intonation contours, pitch accents and phrasing patterns. Thus, audio books act as excellent candidates for building prosodic models and natural sounding synthetic voices. The processing of such audio books poses several challenges including segmentation of long speech files, detection of mispronunciations, extraction and evaluation of representations of prosody. In this thesis, we address the issues of segmentation of long speech files, capturing prosodic phrasing patterns of a speaker, and conversion of speaker characteristics. Techniques developed to address these issues include – text-driven and speech-driven methods for segmentation of long speech files; an unsupervised algorithm for learning speaker-specific phrasing patterns and a voice conversion method by modeling target speaker characteristics. The major conclusions of this thesis are – • Audio books can be used for building synthetic voices. Segmentation of such long speech files can be accomplished without the need for a speech recognition system. • The prosodic phrasing patterns are specific to a speaker. These can be learnt and incorporated to improve the quality of synthetic voices. • Conversion of speaker characteristics can be achieved by modeling speaker-specific features of a target speaker. Finally, the techniques developed in this thesis enable prosody research by leveraging a large number of audio books available in the public domain. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.lti.cs.cmu.edu/sites/default/files/research/thesis/2010/kishore_prahallad.pdf |
| Alternate Webpage(s) | http://lti.cs.cmu.edu/sites/default/files/research/thesis/2010/kishore_prahallad_automatic_building_of_synthetic_voices_from_audio_books.pdf |
| Alternate Webpage(s) | http://www.lti.cs.cmu.edu/research/thesis/2010/kishore_prahallad.pdf |
| Alternate Webpage(s) | http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf |
| Alternate Webpage(s) | http://www.cs.cmu.edu/~srallaba/pdfs/ksp_phd.pdf |
| Alternate Webpage(s) | http://www.cs.cmu.edu/~skishore/ksp_phdthesis.pdf |
| Alternate Webpage(s) | http://www.lti.cs.cmu.edu/research/thesis/2011/kishore_prahallad.pdf |
| Alternate Webpage(s) | http://www.lti.cs.cmu.edu/sites/default/files/research/thesis/2010/kishore_prahallad_automatic_building_of_synthetic_voices_from_audio_books.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |