Loading...
Please wait, while we are loading the content...
Similar Documents
Hierarchical categorization based on k-nearest neighbor approach for web site classification.
| Content Provider | CiteSeerX |
|---|---|
| Author | Kwon, Oh-Woog |
| Abstract | Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. In this paper, we consider how to apply general text categorization techniques to Web site classification tasks. Two relevant issues concern, first, the case where the classifying object is not one document, as with a home page, but a Web site, which is a collection of Web pages. Second, real world Web directories have a complex hierarchical structure, in which leaf and non-leaf categories are directly assigned to Web sites, unlike the hierarchical structure treated in most previous research. On the first issue, this paper proposes the use of Web pages linked by a home page in addition to the home page itself. To accomplish this, we propose a Web site classification method based on connectivity analysis, as well as content analysis of Web sites. On the second issue, the hierarchical structure of classifiers is transformed into a flattened structure, but the classifier for each category uses features of its next-of-kin categories to take advantage of the hierarchical relationship. In experiments on a Korean commercial Web directory, the proposed classification method achieved an amazing improvement of micro-averaging breakeven point by 36.6%, compared with an ordinary classifier. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Home Page Hierarchical Categorization K-nearest Neighbor Approach Web Site Classification Web Site Hierarchical Structure Web Page Classification Method First Issue Viable Method Automatic Categorization Previous Research Relevant Issue Concern Classifying Object Content Analysis Scaling Problem Second Issue Web Site Classification Method General Text Categorization Technique Hierarchical Relationship Web Site Classification Task Korean Commercial Web Directory Next-of-kin Category World Wide Web Real World Web Directory Connectivity Analysis Complex Hierarchical Structure Micro-averaging Breakeven Point Flattened Structure Amazing Improvement Ordinary Classifier Non-leaf Category |
| Content Type | Text |
| Resource Type | Article |