NDLI: Recognition of Nastalique Urdu ligatures

Please wait, while we are loading the content...

HMM-based script identification for OCR

Low resolution Arabic recognition with multidimensional recurrent neural networks

Re-targeting of multi-script document images for handheld devices

Unconstrained handwritten Devanagari character recognition using convolutional neural networks

Multilingual OCR research and applications: an overview

A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models

Recognition of Nastalique Urdu ligatures

Ruling-based table analysis for noisy handwritten documents

Global and local features for recognition of online handwritten numerals and Tamil characters

Word level script recognition for Uighur document mixed with English script

An approach for Bangla and Devanagari video text recognition

A robust table registration method for batch table OCR processing

Levenshtein distance metric based holistic handwritten word recognition

Bag-of-features HMMs for segmentation-free Bangla word spotting

Can we build language-independent OCR using LSTM networks?

Text graphic separation in Indian newspapers

Recognition of offline handwritten numerals using an ensemble of MLPs combined by Adaboost

Multi-script robust reading competition in ICDAR 2013

Recognition of Nastalique Urdu ligatures

Content Provider	ACM Digital Library
Author	Lehal, Gurpreet Singh Rana, Ankur
Abstract	There has been considerable work on Arabic OCR. However, all that work is based on Naskh style. Urdu script is based on Arabic alphabet, but uses Nastalique style. The Nastalique style makes OCR in general and character segmentation in particular, a highly challenging task, so most of the researchers avoid the character segmentation phase and go in for higher unit of recognition. For Urdu, the next higher recognition unit considered by researchers is ligature, which lies between character and word. A ligature is a connected component of one or more characters and usually an Urdu word is composed of 1 to 8 ligatures. There are more than 25,000 Urdu ligatures, out of which top 4567 ligatures account for 99% of coverage. From OCR point of view, a ligature can further be segmented into one primary connected component and zero or more secondary connected components. The primary component represents the basic shape of the ligature, while the secondary connected component corresponds to the dots and diacritics marks and special symbols associated with the ligature. To reduce the class count, the ligatures with similar primary components are clubbed together. In this paper, we have presented a system to recognize 9262 ligatures formed from 2190 primary and 17 secondary components. Various combinations of DCT, Gabor filters and zoning based features along with kNN, HMM and SVM classifiers have been tried and a recognition accuracy of 98% has been reported on pre-segmented ligatures.
Starting Page	1
Ending Page	5
Page Count	5
File Format	PDF
ISBN	9781450321143
DOI	10.1145/2505377.2505379
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2013-08-24
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Urdu ocr Dct Nastalique Svm Ligature identification
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in