NDLI: LSCrawler: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics

Content Provider	IEEE Xplore Digital Library
Author	Yuvarani, M. Iyengar, N.C.S.N. Kannan, A.
Copyright Year	2006
Description	Author affiliation: Infosys Technol. Ltd., Vellore (Yuvarani, M.)
Abstract	The traditional process of focused Web crawler is to harvest a collection of Web documents that are focused on the topical subspaces. The intricacy of focused crawlers is identifying the next most important and relevant link to follow. Focused crawlers mostly rely on probabilistic models for predicting the relevancy of the documents. The Web documents are well characterized by the hypertext and the hypertext can be used to determine the relevance of the document to the search domain. The semantics of the link characterizes the semantics of the document referred. In this article, a novel, and distinctive focused crawler named LSCrawler has been proposed. This LSCrawler system retrieves documents by speculating the relevancy of the document based on the keywords in the link and the surrounding text of the link. The relevancy of the documents is reckoned measuring the semantic similarity between the keywords in the link and the taxonomy hierarchy of the specific domain. The system exhibits better recall as it exploits the semantic of the keywords in the link
Sponsorship	IEEE Comput. Soc. WIC ACM
Starting Page	794
Ending Page	800
File Size	191224
Page Count	7
File Format	PDF
ISBN	0769527477
DOI	10.1109/WI.2006.112
Language	English
Publisher	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher Date	2006-12-18
Publisher Place	China
Access Restriction	Subscribed
Rights Holder	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subject Keyword	Crawlers OWL Taxonomy Web pages Ontologies Predictive models Information retrieval Hardware Web sites Web server
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

An analyst-adaptive approach to Focused Crawlers

Adaptive focused crawling based on link analysis

An efficient adaptive focused crawler based on ontology learning

A Focused Crawler Based on Naive Bayes Classifier

Design of an Enhanced Rule Based Focused Crawler

AuToCrawler: an integrated system for automatic topical crawler

Focused web crawling: A framework for crawling of country based financial data

Effect of feature selection method on the performance of focused crawlers—A case study on traditional and accelerated focused crawlers

Context-Ontology Driven Focused Crawling of Web Documents

LSCrawler: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics

Similar Documents

An analyst-adaptive approach to Focused Crawlers

Adaptive focused crawling based on link analysis

An efficient adaptive focused crawler based on ontology learning

A Focused Crawler Based on Naive Bayes Classifier

Design of an Enhanced Rule Based Focused Crawler

AuToCrawler: an integrated system for automatic topical crawler

Focused web crawling: A framework for crawling of country based financial data

Effect of feature selection method on the performance of focused crawlers—A case study on traditional and accelerated focused crawlers

Context-Ontology Driven Focused Crawling of Web Documents

LSCrawler: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics