Please wait, while we are loading the content...
Please wait, while we are loading the content...
| Content Provider | Springer Nature Link |
|---|---|
| Author | Rojc, Matej Kačič, Zdravko |
| Copyright Year | 2003 |
| Abstract | Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexicons, two of the most important written language resources for the development of ASR (automatic speech recognition) and TTS (text-to-speech) systems. The presented architecture is modular and is particularly suitable for the development of written language resources for inflectional languages. In this paper an implementation is presented for the Slovenian language. The integrated graphic user interface focuses on the morphological and phonetic aspects of language and allows experts to produce good performances during analysis. In multilingual TTS systems, many extensive external written language resources are used, especially in the text processing part. It is very important, therefore, that representation of these resources is time and space efficient. It is also very important that language resources for new languages can be easily incorporated into the system, without modifying the common algorithms developed for multiple languages. In this regard the use of large external language resources (e.g., morphology and phonetic lexicons) represent an important problem because of the required space and slow look-up time. This paper presents a method and its results for compiling large lexicons, using examples for compiling German phonetic and morphology lexicons (CISLEX), and Slovenian phonetic (SIflex) and morphology (SImlex) lexicons, into corresponding finite-state transducers (FSTs). The German lexicons consisted of about 300,000 words, SIflex consisted of about 60,000 and SImlex of about 600,000 words (where 40,000 words were used for representation using finite-state transducers). Representation of large lexicons using finite-state transducers is mainly motivated by considerations of space and time efficiency. A great reduction in size and optimal access time was achieved for all lexicons. The starting size for the German phonetic lexicon was 12.53 MB and 18.49 MB for the morphology lexicon. The starting size for the Slovenian phonetic lexicon was 1.8 MB and 1.4 MB for the morphology lexicon. The final size of the corresponding FSTs was 2.78 MB for the German phonetic lexicon, 6.33 MB for the German morphology lexicon, 253 KB for SIflex and 662 KB for the SImlex lexicon. The achieved look-up time is optimal, since it only depends on the length of the input word and not on the size of the lexicon. Integration of lexicons for new languages into the multilingual TTS system is easy when using such representations and does not require any changes in the algorithms used for such lexicons. |
| Starting Page | 259 |
| Ending Page | 275 |
| Page Count | 17 |
| File Format | |
| ISSN | 13812416 |
| Journal | International Journal of Speech Technology |
| Volume Number | 6 |
| Issue Number | 3 |
| e-ISSN | 15728110 |
| Language | English |
| Publisher | Kluwer Academic Publishers |
| Publisher Date | 2003-01-01 |
| Publisher Place | Boston |
| Access Restriction | One Nation One Subscription (ONOS) |
| Subject Keyword | Artificial Intelligence (incl. Robotics) Signal, Image and Speech Processing Communication |
| Content Type | Text |
| Resource Type | Article |
| Subject | Human-Computer Interaction Computer Vision and Pattern Recognition Software Linguistics and Language |
National Digital Library of India (NDLI) is a virtual repository of learning resources which is not just a repository with search/browse facilities but provides a host of services for the learner community. It is sponsored and mentored by Ministry of Education, Government of India, through its National Mission on Education through Information and Communication Technology (NMEICT). Filtered and federated searching is employed to facilitate focused searching so that learners can find the right resource with least effort and in minimum time. NDLI provides user group-specific services such as Examination Preparatory for School and College students and job aspirants. Services for Researchers and general learners are also provided. NDLI is designed to hold content of any language and provides interface support for 10 most widely used Indian languages. It is built to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular forms of access devices and differently-abled learners. It is designed to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. It is developed, operated and maintained from Indian Institute of Technology Kharagpur.
Learn more about this project from here.
NDLI is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with the respective organization and NDLI has no responsibility or liability for these. Every effort is made to keep the NDLI portal up and running smoothly unless there are some unavoidable technical issues.
Ministry of Education, through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDLI) project.
| Sl. | Authority | Responsibilities | Communication Details |
|---|---|---|---|
| 1 | Ministry of Education (GoI), Department of Higher Education |
Sanctioning Authority | https://www.education.gov.in/ict-initiatives |
| 2 | Indian Institute of Technology Kharagpur | Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project | https://www.iitkgp.ac.in |
| 3 | National Digital Library of India Office, Indian Institute of Technology Kharagpur | The administrative and infrastructural headquarters of the project | Dr. B. Sutradhar bsutra@ndl.gov.in |
| 4 | Project PI / Joint PI | Principal Investigator and Joint Principal Investigators of the project |
Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon |
| 5 | Website/Portal (Helpdesk) | Queries regarding NDLI and its services | support@ndl.gov.in |
| 6 | Contents and Copyright Issues | Queries related to content curation and copyright issues | content@ndl.gov.in |
| 7 | National Digital Library of India Club (NDLI Club) | Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach | clubsupport@ndl.gov.in |
| 8 | Digital Preservation Centre (DPC) | Assistance with digitizing and archiving copyright-free printed books | dpc@ndl.gov.in |
| 9 | IDR Setup or Support | Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops | idr@ndl.gov.in |
|
Loading...
|