Loading...
Please wait, while we are loading the content...
Similar Documents
INTERSPEECH 2006- ICSLP Linguistic tuple segmentation in ngram-based statistical machine translation
| Content Provider | CiteSeerX |
|---|---|
| Author | Gispert, Adrià De Mariño, Joséb. |
| Abstract | Ngram-based Statistical Machine Translation relies on a standard Ngram language model of tuples to estimate the translation process. In training, this translation model requires a segmentation of each parallel sentence, which involves taking a hard decision on tuple segmentation when a word is not linked during word alignment. This is especially critical when this word appears in the target language, as this hard decision is compulsory. In this paper we present a thorough study of this situation, comparing for the first time each of the proposed techniques in two independent tasks, namely English–Spanish European Parliament Proceedings large-vocabulary task and Arabic–English Basic Travel Expressions small-data task. In the face of this comparison, we present a novel segmentation technique which incorporates linguistic information. Results obtained in both tasks outperform all previous techniques. Index Terms: statistical machine translation, tuple segmentation, n-gram-based SMT, linguistic information |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Ngram-based Statistical Machine Translation Icslp Linguistic Tuple Segmentation Hard Decision Tuple Segmentation Linguistic Information Target Language Statistical Machine Translation Novel Segmentation Technique Translation Model Independent Task Thorough Study Standard Ngram Language Model Translation Process First Time Parallel Sentence N-gram-based Smt Word Alignment Previous Technique |
| Content Type | Text |
| Resource Type | Article |