Loading...
Please wait, while we are loading the content...
Similar Documents
A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization (2001).
| Content Provider | CiteSeerX |
|---|---|
| Author | Caropreso, Maria Fernanda Fernandacnandad, Maria Matwin, Stan Sebastiani, Fabrizio |
| Abstract | In this work we investigate the usefulness of n-grams for document indexing in text categorization (TCi We call-gram a set g k of n word stems, and we say that g k occurs in a document d j when a sequence of words appears in d j that, after stop word removal and stemming, consists exactly ofthe n stems in g k , in some order. Previous researches have investigated the use of n-grams (or some variant ofthem) in the context of specific learning algorithms, and thus have not obtained general answers on their usefulness for TC In this work we investigate the usefulness of n-grams inTC independently ofany specific learning algorithm. We do so by applying feature selection to the pool of all k-grams (k # n), and checking how many n-grams score high enough to be selected in the top #k-grams. We report the results of our experiments, using various feature selection measures and varying values of #, performed on theReuters-21 standardTC benchmark. We also report resul... |
| File Format | |
| Publisher Date | 2001-01-01 |
| Access Restriction | Open |
| Subject Keyword | Automated Text Categorization Statistical Phrase Learner-independent Evaluation N-grams Intc Various Feature Selection Measure Ofthe Stem Many N-grams Score Text Categorization General Answer Stop Word Removal Previous Research Document Indexing Feature Selection Top K-grams Thereuters-21 Standardtc Benchmark Variant Ofthem |
| Content Type | Text |