WebSite Logo
  • Content
  • Similar Resources
  • Metadata
  • Cite This
  • Log-in
  • Fullscreen
Log-in
Do not have an account? Register Now
Forgot your password? Account recovery
  1. International Journal of Document Analysis and Recognition (IJDAR)
  2. International Journal of Document Analysis and Recognition (IJDAR) : Volume 12
  3. International Journal of Document Analysis and Recognition (IJDAR) : Volume 12, Issue 3, September 2009
  4. Optical character recognition errors and their effects on natural language processing
Loading...

Please wait, while we are loading the content...

International Journal of Document Analysis and Recognition (IJDAR) : Volume 20
International Journal of Document Analysis and Recognition (IJDAR) : Volume 19
International Journal of Document Analysis and Recognition (IJDAR) : Volume 18
International Journal of Document Analysis and Recognition (IJDAR) : Volume 17
International Journal of Document Analysis and Recognition (IJDAR) : Volume 16
International Journal of Document Analysis and Recognition (IJDAR) : Volume 15
International Journal of Document Analysis and Recognition (IJDAR) : Volume 14
International Journal of Document Analysis and Recognition (IJDAR) : Volume 13
International Journal of Document Analysis and Recognition (IJDAR) : Volume 12
International Journal of Document Analysis and Recognition (IJDAR) : Volume 12, Issue 4, December 2009
International Journal of Document Analysis and Recognition (IJDAR) : Volume 12, Issue 3, September 2009
Special issue on noisy text analytics
Optical character recognition errors and their effects on natural language processing
Using topic models for OCR correction
Successfully detecting and correcting false friends using channel profiles
Language independent unsupervised learning of short message service dialect
An effective coherence measure to determine topical consistency in user-generated content
Opinion mining from noisy text data
International Journal of Document Analysis and Recognition (IJDAR) : Volume 12, Issue 2, July 2009
International Journal of Document Analysis and Recognition (IJDAR) : Volume 12, Issue 1, May 2009
International Journal of Document Analysis and Recognition (IJDAR) : Volume 11
International Journal of Document Analysis and Recognition (IJDAR) : Volume 10
International Journal of Document Analysis and Recognition (IJDAR) : Volume 9
International Journal of Document Analysis and Recognition (IJDAR) : Volume 8
International Journal of Document Analysis and Recognition (IJDAR) : Volume 7
International Journal of Document Analysis and Recognition (IJDAR) : Volume 6
International Journal of Document Analysis and Recognition (IJDAR) : Volume 5
International Journal of Document Analysis and Recognition (IJDAR) : Volume 4
International Journal of Document Analysis and Recognition (IJDAR) : Volume 3
International Journal of Document Analysis and Recognition (IJDAR) : Volume 2
International Journal of Document Analysis and Recognition (IJDAR) : Volume 1

Similar Documents

...
Optical character recognition errors and their effects on natural language processing

Article

...
Performance evaluation for text processing of noisy inputs

Article

...
Impact of imperfect OCR on part-of-speech tagging

Article

...
Sentence boundary detection in conversational speech transcripts using noisily labeled examples

Article

...
Morphological tagging approach in document analysis of invoices

Article

...
Bidirectional HMM-based Arabic POS tagging

Article

...
Robust named entity detection from optical character recognition output

Article

...
Toward enhanced Arabic speech recognition using part of speech tagging

Article

...
Integrating natural language processing with image document analysis: what we learned from two real-world applications

Article

Optical character recognition errors and their effects on natural language processing

Content Provider Springer Nature Link
Author Lopresti, Daniel
Copyright Year 2009
Abstract Errors are unavoidable in advanced computer vision applications such as optical character recognition, and the noise induced by these errors presents a serious challenge to downstream processes that attempt to make use of such data. In this paper, we apply a new paradigm we have proposed for measuring the impact of recognition errors on the stages of a standard text analysis pipeline: sentence boundary detection, tokenization, and part-of-speech tagging. Our methodology formulates error classification as an optimization problem solvable using a hierarchical dynamic programming approach. Errors and their cascading effects are isolated and analyzed as they travel through the pipeline. We present experimental results based on a large collection of scanned pages to study the varying impact depending on the nature of the error and the character(s) involved. This dataset has also been made available online to encourage future investigations.
Starting Page 141
Ending Page 151
Page Count 11
File Format PDF
ISSN 14332833
Journal International Journal of Document Analysis and Recognition (IJDAR)
Volume Number 12
Issue Number 3
e-ISSN 14332825
Language English
Publisher Springer-Verlag
Publisher Date 2009-09-25
Publisher Place Berlin, Heidelberg
Access Restriction One Nation One Subscription (ONOS)
Subject Keyword Performance evaluation Optical character recognition Sentence boundary detection Tokenization Part-of-speech tagging Pattern Recognition Image Processing and Computer Vision
Content Type Text
Resource Type Article
Subject Computer Science Applications Computer Vision and Pattern Recognition Software
  • About
  • Disclaimer
  • Feedback
  • Sponsor
  • Contact
  • Chat with Us
About National Digital Library of India (NDLI)
NDLI logo

National Digital Library of India (NDLI) is a virtual repository of learning resources which is not just a repository with search/browse facilities but provides a host of services for the learner community. It is sponsored and mentored by Ministry of Education, Government of India, through its National Mission on Education through Information and Communication Technology (NMEICT). Filtered and federated searching is employed to facilitate focused searching so that learners can find the right resource with least effort and in minimum time. NDLI provides user group-specific services such as Examination Preparatory for School and College students and job aspirants. Services for Researchers and general learners are also provided. NDLI is designed to hold content of any language and provides interface support for 10 most widely used Indian languages. It is built to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular forms of access devices and differently-abled learners. It is designed to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. It is developed, operated and maintained from Indian Institute of Technology Kharagpur.

Learn more about this project from here.

Disclaimer

NDLI is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with the respective organization and NDLI has no responsibility or liability for these. Every effort is made to keep the NDLI portal up and running smoothly unless there are some unavoidable technical issues.

Feedback

Sponsor

Ministry of Education, through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDLI) project.

Contact National Digital Library of India
Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302
See location in the Map
03222 282435
Mail: support@ndl.gov.in
Sl. Authority Responsibilities Communication Details
1 Ministry of Education (GoI),
Department of Higher Education
Sanctioning Authority https://www.education.gov.in/ict-initiatives
2 Indian Institute of Technology Kharagpur Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project https://www.iitkgp.ac.in
3 National Digital Library of India Office, Indian Institute of Technology Kharagpur The administrative and infrastructural headquarters of the project Dr. B. Sutradhar  bsutra@ndl.gov.in
4 Project PI / Joint PI Principal Investigator and Joint Principal Investigators of the project Dr. B. Sutradhar  bsutra@ndl.gov.in
Prof. Saswat Chakrabarti  will be added soon
5 Website/Portal (Helpdesk) Queries regarding NDLI and its services support@ndl.gov.in
6 Contents and Copyright Issues Queries related to content curation and copyright issues content@ndl.gov.in
7 National Digital Library of India Club (NDLI Club) Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach clubsupport@ndl.gov.in
8 Digital Preservation Centre (DPC) Assistance with digitizing and archiving copyright-free printed books dpc@ndl.gov.in
9 IDR Setup or Support Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops idr@ndl.gov.in
I will try my best to help you...
Cite this Content
Loading...