Loading...
Please wait, while we are loading the content...
Similar Documents
Improving search result presentation in wikipedia
Content Provider | Indraprastha Institute of Information Technology, Delhi |
---|---|
Author | Jain, Ishita |
Abstract | When a user fires a search query to find related articles on a certain topic on Wikipedia, the results are sometimes displayed in an arbitrary fashion without any inter-relations amongst suse- quently ranked articles. The motivation of this project is to improve the presentation of results obtained on firing a search query in Wikipedia and to be able to visualize the selected group amongst other results in the form of a graph using Gephi [14]. The goal is to select a collection which should contain results closely related to the query, and also amongst themselves.The vi- sualization can be easily done because the complete Wikipedia website can be visualized as a graph with each article as a node and the inter-relations between articles as the edges between nodes. The entire Wikipedia database contains around 25 million articles. We select the top n results (say n is of the order of thousands) first and then apply the Bonsai [4] algorithm on these. Bonsai tries to select the highest weighted cluster which is minimally connected, out of the portion of n results selected. The weight here refers to the sum of all nodes included in the final cluster, and the weight of 1 node can be assigned using various methods like term frequency or tf-idf. Here, we use frequency of occurence of query words in the document to score nodes. Higher is the number of times the query word occured, higher is the node score. |
File Format | |
Language | English |
Access Restriction | Authorized |
Subject Keyword | Wikipedia Bonsai Goemans Williamson Algorithm Prize-Collecting Steiner Trees Approximation Algorithms |
Content Type | Text |
Educational Degree | Bachelor of Technology (B.Tech.) |
Resource Type | Thesis |
Subject | Data processing & computer science |