Loading...
Please wait, while we are loading the content...
Similar Documents
Sentence Boundary Disambiguation: A User Friendly Approach
| Content Provider | Semantic Scholar |
|---|---|
| Author | Negi, Pritam Singh Rauthan, M. M. S. Dhami, H. S. |
| Copyright Year | 2010 |
| Abstract | ABSTRACT take care of the sentences which are most common In the present work we have developed an algorithm based on maximum entropy and stop word removal modules, which works with almost 99% accuracy and have established supremacy over the existing paragraph breaker software developed by Text Mining Group, School of Computer Science, Manchester University, United Kingdom . Keywords : Sentence Boundary, Information retrieval, Evaluation. 1. INTRODUCTION Sentence Boundary Disambiguation (SBD) has received increased attention in recent years as a way to enrich speech recognition output for better readability and improved demonstrations in many applications of Natural Language Processing, like: Parsing, Information Extraction, Machine Translation, POS tagging and Document Summarization. Among the most relevant works, we can cite the names of Berger (1996), Palmer & Hearst (1997), Mikheev (2000), Manning & Schutze (2002), Kiss & Strunk (2006), Xuan et al (2007), Siminski (2007) and Gillick (2009) etc. to mention only a few. We know that sentence is a sequence of words ending with a terminal punctuation, such as „.‟,‟?‟, ‟!‟ etc. Most sentences use a period at the end. However, sometimes a period can be associated with an abbreviation, such as ”Mr. or mr, U.S.A., Ph. D., M. Sc. etc.” or can represent a decimal point in a number like 102.53. In all these cases, it is a part of an abbreviation or a number. We cannot delimit a sentence because the period has a different meaning here and therefore there arises an ambiguity in breaking the sentence. To establish the task of sentence boundary disambiguation for a given document there are certain necessary conditions those are very important when breaking a sentence boundary disambiguation. In this paper we have made an attempt to provide a system which can be implemented in any system and can deduce the sentence boundary with high accuracy. For this purpose, we have considered the following conditions through which our system provides the high accuracy for detecting the sentence boundary: not to break a sentence when the sentence contains certain abbreviated words like |
| Starting Page | 33 |
| Ending Page | 37 |
| Page Count | 5 |
| File Format | PDF HTM / HTML |
| Volume Number | 7 |
| Alternate Webpage(s) | http://www.ijcaonline.org/volume7/number8/pxc3871738.pdf |
| Alternate Webpage(s) | https://doi.org/10.5120/1269-1738 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |