Loading...
Please wait, while we are loading the content...
Similar Documents
Tweet classification by data compression
| Content Provider | ACM Digital Library |
|---|---|
| Author | Banno, Ryohei Fujimura, Ko Nishida, Kyosuke Hoshide, Takahide |
| Abstract | We propose a new method that uses data compression for classifying an unseen tweet as being related to an interesting topic or not. Our compression-based tweet classification method, called CTC, evaluates the compressibility of the tweet when given positive and negative examples. This enables our method to handle multilingual tweets in the same manner and to effectively utilize the word context of the tweet, which is extremely important information in the 140 character limit. Experiments with worldwide tweets assigned a single hashtag demonstrate that our method, which uses the Deflate algorithm (used in gzip) for empirical evaluations, achieved higher precision and recall rates than state-of-the-art online learning algorithms. |
| Starting Page | 29 |
| Ending Page | 34 |
| Page Count | 6 |
| File Format | |
| ISBN | 9781450309622 |
| DOI | 10.1145/2064448.2064473 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2011-10-24 |
| Publisher Place | New York |
| Access Restriction | Subscribed |
| Subject Keyword | Data compression Twitter Text classification |
| Content Type | Text |
| Resource Type | Article |