NDLI: Features Discovery for Web Classification Using Support Vector Machine

Content Provider	IEEE Xplore Digital Library
Author	Othman, M.S. Yusuf, L.M. Salim, J.
Copyright Year	2010
Abstract	The ever fast-expanding web information resources pose a big challenge to internet users seeking the most relevant, latest and quality information. The sheer vast amount of web information has resulted in restructuring of the resources. Thus, an appropriate web classification method needs to be established in order for quality web information to be accessed. This paper intends to discuss the web document features that classify the web information resources. Six web document features have been identified which are text, meta tag and title (A), title and text (B), title (C), meta tag and title (D), meta tag (E) and text (F). The Support Vector Machine (SVM) method is used to classify the web document while four types of kernels namely: Radial Basis Function (RBF), linear, polynomial and sigmoid kernels was applied to test the accuracy of the classification. The studies show that the text, meta tag and title (A) features is the best features for classification of web document that employs the four kernels followed by the features on title and text (B) as well as the features on meta tag and title (C). The studies also found that the linear kernel is the best kernel in classifying the web document compared to the RBF, polynomial and sigmoid kernel.
Starting Page	36
Ending Page	40
File Size	291807
Page Count	5
File Format	PDF
ISBN	9781424466405
e-ISBN	9781424466412
DOI	10.1109/ICICCI.2010.16
Language	English
Publisher	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher Date	2010-06-22
Publisher Place	Malaysia
Access Restriction	Subscribed
Rights Holder	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subject Keyword	Support vector machines Accuracy Text categorization Support Vector Machine (SVM) Feature extraction HTML Internet Web Document Kernel Web Classification
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

SVM Based Hybrid Moment Features for Natural Scene Categorization

Video genre categorization using Support Vector Machines

Web Page Classification Based on SVM

Sentiment text classification of customers reviews on the Web based on SVM

SVM Based Gender Classification Using Iris Images

A method for detecting document orientation by using SVM classifier

Support Vector Machine ensembles using features distribution among subsets for enhancing microarray data classification

Feature Selection for Scene Categorization Using Support Vector Machines

Improved SVM method for internet traffic classification based on feature weight learning

Features Discovery for Web Classification Using Support Vector Machine

Similar Documents

SVM Based Hybrid Moment Features for Natural Scene Categorization

Video genre categorization using Support Vector Machines

Web Page Classification Based on SVM

Sentiment text classification of customers reviews on the Web based on SVM

SVM Based Gender Classification Using Iris Images

A method for detecting document orientation by using SVM classifier

Support Vector Machine ensembles using features distribution among subsets for enhancing microarray data classification

Feature Selection for Scene Categorization Using Support Vector Machines

Improved SVM method for internet traffic classification based on feature weight learning

Features Discovery for Web Classification Using Support Vector Machine