Loading...
Please wait, while we are loading the content...
Similar Documents
Enhancing image-based arabic document translation using a noisy channel correction model (2007)
| Content Provider | CiteSeerX |
|---|---|
| Author | Chang, Yi Zhang, Ying Vogel, Stephan Yang, Jie |
| Description | In Proceedings of MT Summit XI An image-based document translation system consists of several components, among which OCR (Optical Character Recognition) plays an important role. However, existing OCR software is not robust against environmental variations. Furthermore, OCR errors are often propagated into the translation component and cause, causing poor end-to-end performance. In this paper, we propose an imagebased document translation using an error correction model to correct misrecognized words from OCR output. We train our correction model from synthetic data with different fonts and sizes to simulate real world situations. We further enhance our correction model with bigrams to improve our word segmentation error correction. Experimental results show substantial improvements in both word recognition accuracy and translation quality. For instance, in an experiment using Arabic Transparent Font, the BLEU score increases from 18.70 to 33.47 with the use of our noisy channel model. |
| File Format | |
| Language | English |
| Publisher Date | 2007-01-01 |
| Access Restriction | Open |
| Subject Keyword | Poor End-to-end Performance Bleu Score Increase Environmental Variation Misrecognized Word Important Role Substantial Improvement Different Font Correction Model Synthetic Data Several Component Image-based Arabic Document Translation Ocr Error Noisy Channel Correction Model Noisy Channel Model Real World Situation Image-based Document Translation System Word Recognition Accuracy Optical Character Recognition Ocr Output Error Correction Model Ocr Software Word Segmentation Error Correction Arabic Transparent Font Experimental Result Translation Quality Translation Component Imagebased Document Translation |
| Content Type | Text |
| Resource Type | Article |