Loading...
Please wait, while we are loading the content...
Similar Documents
Accurate and efficient crawling for relevant websites (2004)
| Content Provider | CiteSeerX |
|---|---|
| Author | Ester, Martin |
| Abstract | Focused web crawlers have recently emerged as an alternative to the well-established web search engines. While the well-known focused crawlers retrieve relevant webpages, there are various applications which target whole websites instead of single webpages. For example, companies are represented by websites, not by individual webpages. To answer queries targeted at websites, web directories are an established solution. In this paper, we introduce a novel focused website crawler to employ the paradigm of focused crawling for the search of relevant websites. The proposed crawler is based on a two-level architecture and corresponding crawl strategies with an explicit concept of websites. The external crawler views the web as a graph of linked websites, selects the websites to be examined next and invokes internal crawlers. Each internal crawler views the webpages of a single given website and performs focused (page) crawling within that website. Our experimental evaluation demonstrates that the proposed focused website crawler clearly outperforms previous methods of focused crawling which were adapted to retrieve websites instead of single webpages. 1. |
| File Format | |
| Publisher Date | 2004-01-01 |
| Access Restriction | Open |
| Subject Keyword | Individual Webpage Established Solution Internal Crawler View Relevant Website Focused Crawling Explicit Concept Web Crawler Web Directory Well-established Web Search Engine Two-level Architecture External Crawler Internal Crawler Website Crawler Well-known Focused Crawler Single Webpage Crawl Strategy Whole Website Relevant Webpage |
| Content Type | Text |