Loading...
Please wait, while we are loading the content...
When the Hunter Becomes the Prey – Tracking down Web Crawlers in Clickstreams
| Content Provider | Semantic Scholar |
|---|---|
| Author | Loureço, Anália Alves, Ronnie Belo, Orlando |
| Copyright Year | 2004 |
| Abstract | Clickstreams are the latest acquisition of decision support systems. They are an amazing opportunity in terms of analysis, opening up the area of usage profiling. However, there are Web-related specificities and issues that have to be taken care of meanwhile. In particular, in order to perform proper usage profiling it is necessary to differentiated conventional users from non-conventional users – specifically, Web crawlers. By definition, they are information hunters that traverse the Web trying to perform some task, like gathering and indexing information about some topic. Usually, their visits to a Web site are brief, but intensive, becoming a routine over time. Clickstreams mixture the activities concerning real users with the ones related to Web crawlers. Therefore, any usage study requires some preprocessing to filter, or at least tag, Web crawlers’ activities, preventing that these might mislead any further analysis. After delimiting their activities, it is possible to deploy regular Web usage analysis as well as Web crawler analysis. With the latest, one can conduct deeper studies about Web crawlers’ behaviours and purposes, learning to distinguish between harmless crawlers and pervasive ones, and making purpose-based clustering in order to establish communities of Web crawlers. In this paper, we review Web crawlers’ primary characteristics, research in the area of crawler detection and pattern analysis and present one case study to highlight the relevance of the task and the use of certain heuristics. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://alfa.di.uminho.pt/~ronnie/files_files/ufr/2004-dataGadgets-v1.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |