Loading...
Please wait, while we are loading the content...
Similar Documents
An Approach Based on Iterative Learning Algorithm for Chinese Text Hierarchy Feature Extraction without Lexicon
| Content Provider | Semantic Scholar |
|---|---|
| Author | Jiang, Shaohua |
| Copyright Year | 2013 |
| Abstract | A great deal of information included in Chinese tex t is invaluable asset for further text mining, but the difference between Chinese and the western language s imposes restrictions on further utilization of Ch inese text. No distinction indication between words by us ing spaces is one of the major differences between Chinese, also some other Asian languages, such as J ap nese, Thai, etc., and Western languages. Chinese segmentation and features extraction is essential i Chinese natural language processing because it is a precondition for further Chinese text information r etrieval and knowledge discovery. Maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics is an effective segmentation and extract ion method for Chinese words and phrases, but there are still some shorter words and phrases included in th e longer ones extracted by MMFS can't be obtained. In order to solve this problem, this paper presents a novel Chinese hierarchy feature extraction method combined MMFS with iterative learning algorithm. Th is method can extract hierarchy feature according t o morphology with no need for lexicon support, no nee d for acquiring the probability between words in advance and no need for Chinese character index. Ex perimental results confirm the efficiency of this statistical method in extracting Chinese hierarchy feature. This method is also beneficial to feature extraction for other Asian languages similar to Chi nese. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.jatit.org/volumes/Vol49No1/30Vol49No1.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |