NDLI: Dual lexical chaining for context based text classification

Content Provider	IEEE Xplore Digital Library
Author	Chakraverty, S. Pandey, U. Juneja, B. Arora, A.
Copyright Year	2015
Description	Author affiliation: Dept. of Comput. Sci. & Eng., IMS Eng. Coll., Ghaziabad, India (Pandey, U.) \|\| Dept. of Comput. Eng., Netaji Subhas Inst. of Technol., New Delhi, India (Chakraverty, S.; Juneja, B.; Arora, A.)
Abstract	Text Classification enhances the accessibility and systematic organization of the vast reserves of data populatingthe world-wide-web. Despite great strides in the field, the domain of context driven text classification provides fresh opportunities to develop more efficient context oriented techniques with refined metrics. In this paper, we propose a novel approach to categorize text documents using a dual lexical chaining technique. The algorithm first prepares a cohesive category-keyword matrix by feeding category names into the WordNet and Wikipedia ontology, extracting lexically and semantically related keywords from them and then adding to the keywords by employing a keyword enrichment process. Next, the WordNet is referred again to find the degree of lexical cohesiveness between the tokens of a document. Terms that are strongly related are woven together into two separate lexical chains; one for their noun senses and another for their verb senses, that represent the feature set for the document. This segregation enables a better expression of word cohesiveness as concept terms and action terms are treated distinctively. We propose a new metric to calculate the strength of a lexical chain. It includes a statistical part given by Term Frequency-Inverse Document Frequency-Relative Category Frequency (TF-IDF-RCF) which itself is an improvement upon the conventional TF-IDF measure. The chain's contextual strength is determined by the degree of its lexical matching with the category-keyword matrix as well as by the relative positions of its constituent terms. Results indicate the efficacy of our approach. We obtained an average accuracy of 90% on 6 categories derived from the 20 News Group and the Reuters corpora. Lexical chaining has been applied successfully to text summarization. Our results indicate a positive direction towards its usefulness for text classification.
Starting Page	432
Ending Page	439
File Size	419149
Page Count	8
File Format	PDF
e-ISBN	9781467369114
DOI	10.1109/ICACEA.2015.7164744
Language	English
Publisher	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher Date	2015-03-19
Publisher Place	India
Access Restriction	Subscribed
Rights Holder	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subject Keyword	Training Computers Electronic publishing Lexical Chaining Term Frequency-Category Frequency Category-keyword strength Cohesiveness Semantics Encyclopedias Context based TC Position Parameter Internet
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Measuring Semantic Relatedness between Words Using Lexical Context

Context Driven Technique for Document Classification

Centroid-based Classification Enhanced with Wikipedia

Lexical Semantic Relatedness for Twitter Analytics

Wikipedia based semantic metadata annotation of audio transcripts

Mash-Up Approach for Web Video Category Recommendation

Context-based image semantic similarity

Ontology-Based Text Classification into Dynamically Defined Topics

Document Topic Extraction Based on Wikipedia Category

Dual lexical chaining for context based text classification

Similar Documents

Measuring Semantic Relatedness between Words Using Lexical Context

Context Driven Technique for Document Classification

Centroid-based Classification Enhanced with Wikipedia

Lexical Semantic Relatedness for Twitter Analytics

Wikipedia based semantic metadata annotation of audio transcripts

Mash-Up Approach for Web Video Category Recommendation

Context-based image semantic similarity

Ontology-Based Text Classification into Dynamically Defined Topics

Document Topic Extraction Based on Wikipedia Category

Dual lexical chaining for context based text classification