NDLI: Exploiting semantic resources for large scale text categorization

Content Provider	Springer Nature Link
Author	Li, Jian Qiang Zhao, Yu Liu, Bo
Copyright Year	2012
Abstract	The traditional supervised classifier for Text Categorization (TC) is learned from a set of hand-labeled documents. However, the task of manual data labeling is labor intensive and time consuming, especially for a complex TC task with hundreds or thousands of categories. To address this issue, many semi-supervised methods have been reported to use both labeled and unlabeled documents for TC. But they still need a small set of labeled data for each category. In this paper, we propose a Fully Automatic Categorization approach for Text (FACT), where no manual labeling efforts are required. In FACT, the lexical databases serve as semantic resources for category name understanding. It combines the semantic analysis of category names and statistic analysis of the unlabeled document set for fully automatic training data construction. With the support of lexical databases, we first use the category name to generate a set of features as a representative profile for the corresponding category. Then, a set of documents is labeled according to the representative profile. To reduce the possible bias originating from the category name and the representative profile, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that one variant of our FACT approach outperforms the state-of-the-art unsupervised TC approach significantly. It can achieve more than 90% of F1 performance of the baseline SVM methods, which demonstrates the effectiveness of the proposed approaches.
Starting Page	763
Ending Page	788
Page Count	26
File Format	PDF
ISSN	09259902
Journal	Journal of Intelligent Information Systems
Volume Number	39
Issue Number	3
e-ISSN	15737675
Language	English
Publisher	Springer US
Publisher Date	2012-06-09
Publisher Place	Boston
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Web-scale text categorization Semantic analysis Semantic information processing Artificial Intelligence (incl. Robotics) Data Structures, Cryptology and Information Theory Document Preparation and Text Processing Business Information Systems Information Storage and Retrieval
Content Type	Text
Resource Type	Article
Subject	Artificial Intelligence Computer Networks and Communications Information Systems Software Hardware and Architecture

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

An efficient and large-scale reasoning method for the semantic Web

The Semantic Service Search Engine (S3E)

Coupling semantic and statistical techniques for dynamically enriching web ontologies

Optimizing queries to remote resources

Automatic content based image retrieval using semantic analysis

Sentence similarity based on semantic kernels for intelligent text retrieval

Towards Intelligent Semantic Caching for Web Sources

Classifying and querying very large taxonomies with bit-vector encoding

Semantic subgroup explanations

Exploiting semantic resources for large scale text categorization

Similar Documents

An efficient and large-scale reasoning method for the semantic Web

The Semantic Service Search Engine (S3E)

Coupling semantic and statistical techniques for dynamically enriching web ontologies

Optimizing queries to remote resources

Automatic content based image retrieval using semantic analysis

Sentence similarity based on semantic kernels for intelligent text retrieval

Towards Intelligent Semantic Caching for Web Sources

Classifying and querying very large taxonomies with bit-vector encoding

Semantic subgroup explanations

Exploiting semantic resources for large scale text categorization