NDLI: A dynamic learning framework to thoroughly extract structured data from web pages without human efforts

Please wait, while we are loading the content...

Back-buy prediction based on TriFG

Social tie mining in company networks

Defining and evaluating network communities based on ground-truth

There is more than complex contagion: an indirect influence analysis on Twitter

Question routing in community based QA: incorporating answer quality and answer content

Personalized resource categorisation in folksonomies

Learning approach for domain-independent linked data instance matching

Contraction network for solving maximum flow problem

A dynamic learning framework to thoroughly extract structured data from web pages without human efforts

User-sentiment topic model: refining user's topics with sentiment information

Diversity in ranking using negative reinforcement

Extracting data records from web using suffix tree

Automatic detection of rumor on Sina Weibo

A dynamic learning framework to thoroughly extract structured data from web pages without human efforts

Content Provider	ACM Digital Library
Author	Li, Long Liao, Lejian Song, Dandan Wu, Yunpeng Sun, Fei
Abstract	Tremendous concrete and comprehensive information is contained in structured data of web pages. Attributes and their corresponding values of entities are precious resources for automatic semantic annotation, knowledge discovery, and information utilization. However, various displaying styles and formats of web pages make it a challenging task to extract them. Based on our observation, despite the lack of information in a single page, different web pages and different web sites illustrating similar entities can provide adequate knowledge for computers to learn. This paper presents a dynamic learning framework to effectively extract structured information from enormous websites in various verticals (e.g., books, cameras, jobs). Different with other existing approaches that are static, require manually labeling samples and can not be flexible to unseen attributes, our approach aims at dynamically, automatically and thoroughly extracting structured data from web pages. Experiments with totally 17,850 web pages in 4 verticals demonstrated the effectiveness of our framework.
Starting Page	1
Ending Page	8
Page Count	8
File Format	PDF
ISBN	9781450315463
DOI	10.1145/2350190.2350199
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2012-08-12
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Learning framework Structured data Information extraction
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in