Loading...
Please wait, while we are loading the content...
Similar Documents
Chapter 18 benchmarking strategy for arabic screen-rendered word recognition.
| Content Provider | CiteSeerX |
|---|---|
| Author | Slimane, Fouad Kanoun, Slim Hennebert, Jean Ingold, Rolf Alimi, Adel M. Slimane, F. Hennebert, J. Ingold, R. Kanoun, S. Alimi, A. M. |
| Abstract | Abstract This chapter presents a new benchmarking strategy for Arabic screen-based word recognition. Firstly, we report on the creation of the new APTI (Ara-bic Printed Text Image) database. This database is a large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style word recognition sys-tems in Arabic. Such systems take as input a text image and compute as out-put a character string corresponding to the text included in the image. The chal-lenges that are addressed by the database are in the variability of the sizes, fonts and styles used to generate the images. A focus is also given on low resolu-tion images where anti-aliasing is generating noise on the characters being recog-nized. The database contains 45,313,600 single word images totalling more than 250 million characters. Ground truth annotation is provided for each image from an XML file. The annotation includes the number of characters, the number of pieces of Arabic words (PAWs), the sequence of characters, the size, the style, the font used to generate each image, etc. Secondly, we describe the Arabic Recog- |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Arabic Screen-rendered Word Recognition Benchmarking Strategy Low Resolu-tion Image Text Image Arabic Word Arabic Screen-based Word Recognition Multi-style Word Recognition Sys-tems New Benchmarking Strategy Xml File Ground Truth Annotation Ara-bic Printed Text Image Single Word Large-scale Benchmarking New Apti Arabic Recog |
| Content Type | Text |
| Resource Type | Article |