Loading...
Please wait, while we are loading the content...
Similar Documents
Learning a Monolingual Language Model from a Multilingual Text Database (2000)
| Content Provider | CiteSeerX |
|---|---|
| Author | Ghani, Rayid Jones, Rosie |
| Description | Language models are of importance in speech recognition, document classification, and database selection algorithms. Traditionally language models are learned from corpora specifically acquired for the purpose. Increasingly, however, there is interest in constructing language models for specific languages from heterogeneous sources such as the web. Querybased sampling has been shown to be effective for gauging the content of monolingual heterogeneous databases. We propose evaluating an extension to this approach by considering the case of learning a monolingual language model from a multi-lingual database, and extensions to the querybased sampling algorithm to handle this case. We test our approach on a corpus collected from the WWW and show that our proposed methods perform accurately and efficiently for learning a language model of Tagalog, when these documents are only 2.5% of the documents in a collection. |
| File Format | |
| Language | English |
| Publisher Date | 2000-01-01 |
| Publisher Institution | In Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM |
| Access Restriction | Open |
| Subject Keyword | Speech Recognition Monolingual Heterogeneous Database Specific Language Language Model Multilingual Text Database Monolingual Language Model Database Selection Algorithm Multi-lingual Database Querybased Sampling Algorithm Heterogeneous Source Document Classification |
| Content Type | Text |
| Resource Type | Article |