Loading...
Please wait, while we are loading the content...
Similar Documents
Semantic similarity through hierarchical abstraction of knowledge
Content Provider | Indraprastha Institute of Information Technology, Delhi |
---|---|
Author | Arora, Kanchan |
Abstract | Identifying semantic similarity between two texts has many applications in NLP including information extraction and retrieval, word sense disambigua- tion, text summarization and type classi cation. Similarity between texts is commonly determined using a taxonomy based approach, but the limited scalability of existing taxonomies has led recent research to use Wikipedia's encyclopaedic knowledge base to nd similarity or relatedness. In this the- sis, we propose Hierarchical Semantic Analysis, a method which represents semantics of a text in high dimensional space of Wikipedia concepts and category hierarchies. We represent the meaning of any text excerpt as a weighed vector of Wikipedia-based resources. To evaluate the similarity of texts in this space, we compare the corresponding vectors using conventional metrics (e.g. cosine). Compared with the previous state of the art, use of Hierarchical Semantic Analysis(HSA) results in substantial improvements in correlation of computed similarity scores with human judgements from r= .873 to 0.901 for short sentence pairs and from r= .72 to 0.863 for paragraph pairs. |
File Format | |
Language | English |
Access Restriction | Open |
Subject Keyword | Wikipedia Semantic Similarity Hierarchical Abstraction |
Content Type | Text |
Educational Degree | Master of Technology (M.Tech.) |
Resource Type | Thesis |
Subject | Data processing & computer science |