NDLI: Feature Selection For Text Categorisation Using Self-organising Map

Content Provider	IEEE Xplore Digital Library
Author	Manomaisupat, P. Ahmad, K.
Copyright Year	2005
Description	Author affiliation: Dept. of Comput., Surrey Univ., Guildford (Manomaisupat, P.; Ahmad, K.)
Abstract	The categorisation of documents in large diverse collections poses a keen problem. The choice of a vector that may represent a document collection, and categories of documents within, is still an art form. We describe a study where four different types of term occurrence and document frequency metrices have been used with varying levels of success measured by classification accuracy statistics and average quantization error; TFIDF and its variant, term relevance, have been used together with a metric based on contrastive linguistics and another uses a finely-classified terminology data base. A novel method of term representation has been used - each element of the vector corresponds to the absence/presence of a set terms colocated within the element on the basis of frequency. In addition, we have defined a new baseline for comparison - a randomly selected set of terms for constructing a representative vector from within the collection. Categorisation was performed using the classic self-organising maps. We confirm that there is an optimum size of the input vector-c.100-200 terms- exists for each of the term-occurrence/document frequency metrices, and there appears to be a saturation point beyond that optimal limit
Starting Page	1875
Ending Page	1880
File Size	2679773
Page Count	6
File Format	PDF
ISBN	0780394224
DOI	10.1109/ICNNB.2005.1614991
Language	English
Publisher	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher Date	2005-10-13
Publisher Place	China
Access Restriction	Subscribed
Rights Holder	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subject Keyword	Computer science Art Filtering Error analysis Terminology Text categorization Quantization Routing Frequency measurement Thesauri
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in