NDLI: Popularity-guided top-k extraction of entity attributes

Please wait, while we are loading the content...

The web of linked data: a global public dataspace on the web: WebDB 2010 keynote

An agglomerative query model for discovery in linked data: semantics and approach

Find your advisor: robust knowledge gathering from the web

Manimal: relational optimization for data-intensive programs

Concurrent one-way protocols in around-the-clock social networks

XML-based RDF data management for efficient query processing

Redundancy-driven web data extraction and integration

Learning topical transition probabilities in click through data with regression models

Reconciling two models of multihierarchical markup

Querying Wikipedia documents and relationships

Using latent-structure to detect objects on the web

Improved recommendations via (more) collaboration

Tree patterns with full text search

WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data

Popularity-guided top-k extraction of entity attributes

Popularity-guided top-k extraction of entity attributes

Content Provider	ACM Digital Library
Author	Solomon, Matthew Gravano, Luis Yu, Cong
Abstract	Recent progress in information extraction technology has enabled a vast array of applications that rely on structured data that is embedded in natural-language text. In particular, the extraction of concepts from the Web---with their desired attributes---is important to provide applications with rich, structured access to information. In this paper, we focus on an important family of concepts, namely, entities (e.g., people or organizations) and their attributes, and study how to efficiently and effectively extract them from Web-accessible text documents. Unfortunately, information extraction over the Web is challenging for both quality and efficiency reasons. Regarding quality, many sources on the Web contain misleading or invalid information; furthermore, extraction systems often return incorrect data. Regarding efficiency, information extraction is a time-consuming process, often involving expensive text-processing steps. We present a top-k extraction processing approach that addresses both the quality and efficiency challenges: for each entity and attribute of interest, we return the top-k values of the attribute for the entity according to a scoring function for extracted attribute values. This scoring function weighs the extraction confidence from individual documents, as well as the "importance" of the documents where the information originates. We define the document importance in terms of entity-specific document "popularity" statistics from a major search engine. Overall, our top-k extraction processing approach manages to identify the top attribute values for the entities of interest efficiently, as we demonstrate with a large-scale experimental evaluation over real-life data.
Starting Page	1
Ending Page	6
Page Count	6
File Format	PDF
ISBN	9781450301862
DOI	10.1145/1859127.1859139
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2010-06-06
Publisher Place	New York
Access Restriction	Subscribed
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in