Loading...
Please wait, while we are loading the content...
Detecting and Correcting Real-word Errors in Tamil Sentences
| Content Provider | Semantic Scholar |
|---|---|
| Author | Sakuntharaj, Ratnasingam Mahesan, Sinnathamby |
| Copyright Year | 2018 |
| Abstract | Spell checkers concern two types of errors namely non-word errors and real-word errors. Non-word errors can be of two categories: First one is that the word itself is invalid; the other is that the word is valid but not present in a valid lexicon. Real-word error means the word is valid but inappropriate in the context of the sentence. An approach to correcting real-word errors in Tamil language is proposed in this paper. A bigram probability model is constructed to determine appropriateness of the valid word in the context of the sentence using a 3GB volume of corpora of Tamil text. In case of lacking appropriateness, the word is marked as a real-word error and minimum edit distance technique is used to find lexically similar words, and the appropriateness of such words is measured by a word-level n-gram language probability model. A hash table with word-length as the key is used to speed up the search for words to check for the lexical similarity. Words of lengths of m-1 to m+1 are considered with m being the length of the word found to be ‘inappropriate’. Test results show that the suggestions generated by the system are with more than 98% accuracy as approved by a Scholar in Tamil. |
| File Format | PDF HTM / HTML |
| DOI | 10.4038/rjs.v9i2.43 |
| Alternate Webpage(s) | http://rjs.ruh.ac.lk/index.php/rjs/article/download/219/217 |
| Alternate Webpage(s) | https://doi.org/10.4038/rjs.v9i2.43 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |