NDLI: Extracting Knowledge from Wikipedia Articles through Distributed Semantic Analysis

Please wait, while we are loading the content...

Automatic Mapping of Wikipedia Templates for Fast Deployment of Localised DBpedia Datasets

A Data Restore Model for Reproducibility in Computational Statistics

Comparison of the results of an authorship-based expert recommender against data from a directory of experts

Advanced Mining of Association Rules over Periodic Snapshots in a Data Warehouse

A Visualization Approach for Cross-level Exploration of Spatiotemporal Data

Aligning Knowledge Development between Innovation-Driven Context and Knowledge Organization Systems

Dependencies between E-Learning Usage Patterns and Learning Results

And Data for All: On the Validity and Usefulness of Open Government Data

Dependency Extraction from Growth Trajectory using Sequential Pattern

Assessing Barcamps: Incentives for Participation in Ad-hoc Conferences and the Role of Social Media

Facilitating Team Processes with Recommender Systems: A Behavioral Science Perspective

A Visual Interactive System for Spatial Querying and Ranking of Geographic Regions

Do We Need Entity-Centric Knowledge Bases for Entity Disambiguation?

Automatic Annotation of Scientific Video Material based on Visual Concept Detection

Organize, socialize, benefit: how social media applications impact enterprise success and performance

Capturing and Sharing Scientific Research Data

Exploratory Search and Content Discovery: The Semantic Media Browser (SMB)

Best of Both Worlds: Hybrid Knowledge Visualization in Police Crime Fighting and Military Operations

Shaping a social software for a distributed military organisation

Investigating Factors for Knowledge Sharing Using Web Technologies

Extracting Knowledge from Wikipedia Articles through Distributed Semantic Analysis

Exploring Factual and Perceived Use and Benefits of a Web 2.0-based Knowledge Management Application: The Siemens Case References+

Managing knowledge across disciplines and departments for automotive X-in-the-Loop-Simulation

Extraction of Address Data from Unstructured Text using Free Knowledge Resources

Games with a Purpose or Mechanised Labour?: A Comparative Study

On the Implications of Lessons Learned Use for Lessons Learned Content

Knowledge Scaffolding: A Classification of Visual Structures for Knowledge Communication in Teams

Professional Management of Intellectual Capital in the automotive industry of Baden Württemberg

Representing Multidimensional Cancer Registry Data

OntoSketch: Towards Digital Sketching as a Tool for Creating and Extending Ontologies for Non-Experts

TiNYARM: Awareness of Research Papers in a Community of Practice

Semantically based visual tracking of engineering tasks in automotive product lifecycle

Semantic Pattern Transformation: Applying Knowledge Discovery Processes in Heterogeneous Domains

Using mobile technology for inter-organisational collaboration and end-customer integration

Success Factors of Enterprise 2.0

Supporting Ontology Alignment Tasks with Edge Bundling

Visual Analysis of Compliance with Clinical Guidelines

Extracting Knowledge from Wikipedia Articles through Distributed Semantic Analysis

Content Provider	ACM Digital Library
Author	Di Francesco, Mario Hieu, Nguyen Trung Ylä-Jääski, Antti
Abstract	Computing semantic word similarity and relatedness requires access to vast amounts of semantic space for effective analysis. As a consequence, it is time-consuming to extract useful information from a large amount of data on a single workstation. In this paper, we propose a system, called Distributed Semantic Analysis (DSA), that integrates a distributed-based approach with semantic analysis. DSA builds a list of concept vectors associated with each word by exploiting the knowledge provided by Wikipedia articles. Based on such lists, DSA calculates the degree of semantic relatedness between two words through the cosine measure. The proposed solution is built on top of the Hadoop MapReduce framework and the Mahout machine learning library. Experimental results show two major improvements over the state of the art, with particular reference to the Explicit Semantic Analysis method. First, our distributed approach significantly reduces the computation time to build the concept vectors, thus enabling the use of larger inputs that is the basis for more accurate results. Second, DSA obtains a very high correlation of computed relatedness with reference benchmarks derived by human judgements. Moreover, its accuracy is higher than solutions reported in the literature over multiple benchmarks.
Starting Page	1
Ending Page	8
Page Count	8
File Format	PDF
ISBN	9781450323000
DOI	10.1145/2494188.2494195
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2013-09-04
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Word relatedness Semantic analysis Wikipedia knowledge Distributed computing
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Extracting semantic knowledge from Wikipedia category names

Computing semantic relatedness using word frequency and layout information of Wikipedia

Computing semantic relatedness from human navigational paths on Wikipedia

Knowledge Derived From Wikipedia For Computing Semantic Relatedness

Knowledge Derived From Wikipedia For Computing Semantic Relatedness

Wikirelate! computing semantic relatedness using wikipedia (2006)

Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge

Computing Terms Semantic Relatedness by Knowledge in Wikipedia

WikiRelate! Computing Semantic Relatedness Using Wikipedia

Extracting Knowledge from Wikipedia Articles through Distributed Semantic Analysis

Similar Documents

Extracting semantic knowledge from Wikipedia category names

Computing semantic relatedness using word frequency and layout information of Wikipedia

Computing semantic relatedness from human navigational paths on Wikipedia

Knowledge Derived From Wikipedia For Computing Semantic Relatedness

Knowledge Derived From Wikipedia For Computing Semantic Relatedness

Wikirelate! computing semantic relatedness using wikipedia (2006)

Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge

Computing Terms Semantic Relatedness by Knowledge in Wikipedia

WikiRelate! Computing Semantic Relatedness Using Wikipedia

Extracting Knowledge from Wikipedia Articles through Distributed Semantic Analysis