Loading...
Please wait, while we are loading the content...
Similar Documents
A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion
| Content Provider | CiteSeerX |
|---|---|
| Author | Tian, Ye Weiss, Gary M. Ma, Qiang |
| Abstract | Abstract: This paper describes a machine learning approach for detecting web spam. Each example in this classification task corresponds to 100 web pages from a host and the task is to predict whether this collection of pages represents spam or not. This task is part of the 2007 ECML/PKDD Graph Labeling Workshop’s Web Spam Challenge (track 2). Our approach begins by adding several human-engineered features constructed from the raw data. We then construct a rough classifier and use semi-supervised learning to classify the unlabelled examples provided to us. We then construct additional link-based features and incorporate them into the training process. We also employ a combinatorial feature-fusion method for “compressing ” the enormous number of word-based features that are available, so that conventional machine learning algorithms can be used. Our results demonstrate the effectiveness of semisupervised learning and the combinatorial feature-fusion method. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Combinatorial Feature-fusion Semi-supervised Approach Web Spam Detection Combinatorial Feature-fusion Method Web Spam Challenge Additional Link-based Feature Word-based Feature Machine Learning Approach Semisupervised Learning Semi-supervised Learning Unlabelled Example Training Process Several Human-engineered Feature Web Spam Ecml Pkdd Graph Labeling Workshop Raw Data Web Page Enormous Number Conventional Machine Rough Classifier Classification Task |
| Content Type | Text |
| Resource Type | Article |