Loading...
Please wait, while we are loading the content...
SMT systems for less-resourced languages based on domain-specific data
| Content Provider | Semantic Scholar |
|---|---|
| Author | Offersgaard, Lene Hansen, Dorte Haltrup |
| Copyright Year | 2012 |
| Abstract | In this paper we show that good SMT systems for less-resourced languages can be obtained by using even small amounts of high quality domain-specific data. We suggest a method to filter newly collected data for parallel corpora, using the internal alignment scores from the aligning process. The filtering process is easy to use and is based on open-source tools. The domain-specific data are used in combination with other public available resources for training SMT systems. Automatic evaluation shows that relatively small amounts of newly collected domain-specific data result in systems with promising BLEU scores in the range of 52.9 to 60.9. The LetsMT! platform is used to create the presented machine translation systems, where the flexible platform allows uploading the user’s own data for training. The paper shows that the platform is a promising way of making SMT systems available for less-resourced languages. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.mt-archive.info/10/BUCC-2012-Offersgaard.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |