Loading...
Please wait, while we are loading the content...
Odia Characters Recognition by Training Tesseract OCR Engine
| Content Provider | Semantic Scholar |
|---|---|
| Author | Nayak, Mamata Nayak, Ajit Kumar |
| Copyright Year | 2013 |
| Abstract | of Optical Character Recognition (OCR) for an Indian script is an active area of research today. The presence of a large number of letters in the alphabet set, their sophisticated combinations and the complicated grapheme's they formed is a great challenge to an OCR designer. There are many application areas where, OCR can be used like, preserving old documents in electronics format, helping visually impaired persons to know the content of a document by transforming into speech, saving document images within limited space, making a electronic dictionary of words, preserving the ancient characters those are not included in the current set of characters of a language and many more. Currently, Tesseract, an open source OCR engine is considered as one of the most accurate FOSS OCR engines. Tesseract has already been designed to recognizing English, Italian, French, German, Spanish and Dutch and many more (11), as well as for few Indian languages such as Bengali, Tamil, Telugu, Malayalam. Similarly, Tesseract can be made to recognize other scripts if the engine can be trained with the requisite data. The objective of this work is to develop a training process for Tesseract OCR engine such that the engine will be capable of recognizing printed documents of Odia language used in the state of Odisha (formerly known as Orissa), India. |
| Starting Page | 25 |
| Ending Page | 30 |
| Page Count | 6 |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://www.ijcaonline.org/proceedings/icdcit2014/number1/14381-1306?format=pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |