NDLI: Application of Conditional Random Fields model in unknown words identification

Content Provider	IEEE Xplore Digital Library
Author	Hai-Jun Zhang Wei-Min Pan Shu-Min Shi Chao-Yong Zhu
Copyright Year	2010
Description	Author affiliation: School of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China (Hai-Jun Zhang; Wei-Min Pan) \|\| School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China (Chao-Yong Zhu) \|\| School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China (Shu-Min Shi)
Abstract	This paper proposed a method for Unknown Words Identification (UWI) based on repeats. To identify Unknown words with reliable theory, we put forward a formal model for the process of UWI, which can give directions on the selection of features used in UWI in theory. For the formal model, we propose employing Conditional Random Fields model (CRF) as statistical frame to resolve it. Under the statistical frame, UWI is converted to the process of exploiting effective features that can represent the essences of unknown words. The experiments show that the method of this paper is effective, and reasonable combination of features used in CRF can evidently improve the result of UWI. The ultimate result (F score) of this method is 47.81% and 69.83% in open test and word extraction respectively, which is better over the best result reported in previous works.
Starting Page	1839
Ending Page	1843
File Size	95311
Page Count	5
File Format	PDF
ISBN	9781424465262
e-ISBN	9781424465279
DOI	10.1109/ICMLC.2010.5580955
Language	English
Publisher	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher Date	2010-07-11
Publisher Place	China
Access Restriction	Subscribed
Rights Holder	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subject Keyword	Feature extraction Training Data mining Machine learning Entropy Cybernetics Helium Feature combination Unknown words identification Repeats CRF Chinese word segmentation
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Chinese chunking algorithm based on conditional random fields

A New Machine Learning Method for Chinese Overlapping Disambiguity--Conditional Random Fields

Enhancement of unsupervised feature selection for conditional random fields learning in Chinese word segmentation

Mining Chinese comparative sentences by semantic role labeling

Labeling Turkish news stories with CRF

Combination of machine learning methods for optimum chinese word segmentation (2005)

A morphology-based Chinese word segmentation method

Which performs better for new word detection, character based or Chinese Word Segmentation based?

Chinese ner hybrid pattern based on multi-feature fusion

Application of Conditional Random Fields model in unknown words identification

Similar Documents

Chinese chunking algorithm based on conditional random fields

A New Machine Learning Method for Chinese Overlapping Disambiguity--Conditional Random Fields

Enhancement of unsupervised feature selection for conditional random fields learning in Chinese word segmentation

Mining Chinese comparative sentences by semantic role labeling

Labeling Turkish news stories with CRF

Combination of machine learning methods for optimum chinese word segmentation (2005)

A morphology-based Chinese word segmentation method

Which performs better for new word detection, character based or Chinese Word Segmentation based?

Chinese ner hybrid pattern based on multi-feature fusion

Application of Conditional Random Fields model in unknown words identification