NDLI: Investigation and modeling of the structure of texting language

Content Provider	Springer Nature Link
Author	Choudhury, Mojit Saraf, Rahul Jain, Vijit Mukherjee, Animesh Sarkar, Sudeshna Basu, Anupam
Copyright Year	2007
Abstract	Language usage over computer mediated discourses, such as chats, emails and SMS texts, significantly differs from the standard form of the language and is referred to as texting language (TL). The presence of intentional misspellings significantly decrease the accuracy of existing spell checking techniques for TL words. In this work, we formally investigate the nature and type of compressions used in SMS texts, and develop a Hidden Markov Model based word-model for TL. The model parameters have been estimated through standard machine learning techniques from a word-aligned SMS and standard English parallel corpus. The accuracy of the model in correcting TL words is 57.7%, which is almost a threefold improvement over the performance of Aspell. The use of simple bigram language model results in a 35% reduction of the relative word level error rates.
Starting Page	157
Ending Page	174
Page Count	18
File Format	PDF
ISSN	14332833
Journal	International Journal of Document Analysis and Recognition (IJDAR)
Volume Number	10
Issue Number	3-4
e-ISSN	14332825
Language	English
Publisher	Springer-Verlag
Publisher Date	2007-10-24
Publisher Place	Berlin, Heidelberg
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Texting language SMS Hidden Markov Model Text correction Spell checking Pattern Recognition Image Processing and Computer Vision
Content Type	Text
Resource Type	Article
Subject	Computer Science Applications Computer Vision and Pattern Recognition Software

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Markov models for offline handwriting recognition: a survey

Finding structure in noisy text: topic classification and unsupervised clustering

Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models

A robust probabilistic Braille recognition system

Robust named entity detection from optical character recognition output

An Improved Hierarchical Dirichlet Process-Hidden Markov Model and Its Application to Trajectory Modeling and Retrieval

Cursive word recognition using a random field based hidden Markov model

Error handling approach using characterization and correction steps for handwritten document analysis

Recognition of human actions using texture descriptors

Investigation and modeling of the structure of texting language

Similar Documents

Markov models for offline handwriting recognition: a survey

Finding structure in noisy text: topic classification and unsupervised clustering

Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models

A robust probabilistic Braille recognition system

Robust named entity detection from optical character recognition output

An Improved Hierarchical Dirichlet Process-Hidden Markov Model and Its Application to Trajectory Modeling and Retrieval

Cursive word recognition using a random field based hidden Markov model

Error handling approach using characterization and correction steps for handwritten document analysis

Recognition of human actions using texture descriptors

Investigation and modeling of the structure of texting language