Loading...
Please wait, while we are loading the content...
Similar Documents
On sampling the wisdom of crowds: random vs. Expert sampling of the twitter stream.
| Content Provider | CiteSeerX |
|---|---|
| Author | Ghosh, Saptarshi Sharma, Naveen Zafar, Muhammad Bilal Ganguly, Niloy Bhattacharya, Parantapa Gummadi, Krishna P. |
| Abstract | Several applications today rely upon content streams crowdsourced from online social networks. Since real-time processing of large amounts of data generated on these sites is difficult, analytics companies and researchers are increasingly resorting to sampling. In this paper, we investigate the crucial question of how to sample the data generated by users in social networks. The traditional method is to randomly sample all the data. We analyze a different sampling methodology, where content is gathered only from a relatively small subset (< 1%) of the user population namely, the expert users. Over the duration of a month, we gathered tweets from over 500,000 Twitter users who are identified as experts on a diverse set of topics, and compared the resulting expert-sampled tweets with the 1 % randomly sampled tweets provided publicly by Twitter. We compared the sampled datasets along several dimensions, including the diversity, timeliness, and trustworthiness of the information contained within them, and find important differences between the datasets. Our observations have major implications for applications such as topical search, trustworthy content recommendations, and breaking news detection. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Expert Sampling Twitter Stream Random V Social Network Expert-sampled Tweet Diverse Set Real-time Processing Analytics Company Sampled Datasets Expert User Topical Search Several Application Today Crucial Question Online Social Network Content Stream Major Implication Several Dimension Traditional Method News Detection Trustworthy Content Recommendation Small Subset Important Difference Twitter User Different Sampling Methodology Large Amount User Population |
| Content Type | Text |