NDLI: Using linked data to mine RDF from wikipedia's tables

Please wait, while we are loading the content...

Data that matter: opportunities in crisis informatics research

Improving the efficiency of multi-site web search engines

Relative confidence sampling for efficient on-line ranker evaluation

Estimating ad group performance in sponsored search

Discovering common motifs in cursor movement data for improving web search

Taxonomy discovery for personalized recommendation

Improving pairwise learning for item recommendation from implicit feedback

Detecting cohesive and 2-mode communities indirected and undirected networks

Prediction in a microblog hybrid network using bonacich potential

Chinese-English mixed text normalization

Spatial compactness meets topical consistency: jointly modeling links and content for community detection

Inferring the impacts of social media on crowdfunding

Integration of large scale knowledge bases using probabilistic graphical models

Behavioral data mining and network analysis in massive online games

Log-based personalization: the 4th web search click data (WSCD) workshop

A self-adapting latency/power tradeoff model for replicated search engines

Adapting deep RankNet for personalized search

Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising

Modeling dwell time to predict click-level satisfaction

Who likes it more?: mining worth-recommending items from long tails by modeling relative preference

Personalized entity recommendation: a heterogeneous information network approach

FENNEL: streaming graph partitioning for massive scale graphs

Learning social network embeddings for predicting information diffusion

Sentiment analysis on evolving social streams: how self-report imbalances can help

Scalable topic-specific influence analysis on microblogs

Understanding and promoting micro-finance activities in Kiva.org

Strategy in action: analyzing online search behavior bymining search strategies

Exploration and mining of web repositories

Web-scale classification: web classification in the big data era

Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts

Search engine click spam detection based on bipartite graph propagation

An efficient framework for online advertising effectiveness measurement and comparison

User modeling in search logs via a nonparametric bayesian approach

On building entity recommender systems using user click log and freebase knowledge

Social collaborative retrieval

A few good predictions: selective node labeling in a social network

Modeling opinion dynamics in social networks

Entity linking at the tail: sparse signals, unknown entities, and phrase models

WebChild: harvesting and organizing commonsense knowledge from the web

Who watches (and shares) what on youtube? and when?: using twitter to understand youtube viewership

On discovering non-obvious recommendations: using unexpectedness and neighborhood selection methods in collaborative filtering systems

Big graph mining for the web and social media: algorithms, anomaly detection, and applications

1st workshop on diffusion networks and cascade analytics

Exploiting user disagreement for web search evaluation: an experimental approach

Sampling dilemma: towards effective data sampling for click prediction in sponsored search

LASER: a scalable response prediction platform for online advertising

The last click: why users give up information network navigation

Transferring heterogeneous links across location-based social networks

Active learning for networked data based on non-progressive diffusion model

Fast approximation of betweenness centrality through sampling

Latent dirichlet allocation based diversified retrieval for e-commerce search

Using linked data to mine RDF from wikipedia's tables

Detecting non-gaussian geographical topics in tagged photo collections

Exploratory search with semantic transformations using collaborative knowledge bases

Diversity and novelty in web search, recommender systems and data streams

Workshop on large-scale and distributed systems for information retrieval (LSDS-IR 2014)

Improving search relevance for short queries in community question answering

Exploiting contextual factors for click modeling in sponsored search

Lessons from the journey: a query log analysis of within-session learning

Customized tour recommendations in urban areas

Learning latent representations of nodes for classifying in heterogeneous social networks

Effective co-betweenness centrality computation

Supervised N-gram topic model

Knowledge-based graph document modeling

Ranking in heterogeneous social media

Search by multiple examples

Multilingual probabilistic topic modeling and its applications in web mining and search

Data design for personalization: current challenges and emerging opportunities

Struggling or exploring?: disambiguating long search sessions

Predicting response in mobile advertising with hierarchical importance-aware factorization machine

Scalable K-Means by ranked retrieval

Going beyond Corr-LDA for detecting specific comments on news & blogs

Trust, but verify: predicting contribution quality for knowledge base construction and curation

Visualizing brand associations from web community photos

Entity linking and retrieval for semantic search

Democracy is good for ranking: towards multi-view rank learning and adaptation in web search

Partner tiering in display advertising

Nonparametric bayesian upstream supervised multi-modal topic models

Modelling growth of urban crowd-sourced information

Is a picture really worth a thousand words?: - on the role of images in e-commerce

Using linked data to mine RDF from wikipedia's tables

Content Provider	ACM Digital Library
Author	Hogan, Aidan Muñoz, Emir Mileo, Alessandra
Abstract	The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%.
Starting Page	533
Ending Page	542
Page Count	10
File Format	PDF
ISBN	9781450323512
DOI	10.1145/2556195.2556266
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2014-02-24
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Linked data Web tables Wikipedia Data mining
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in