Loading...
Please wait, while we are loading the content...
Similar Documents
Analysis and construction of noun hypernym hierarchies to enhance Roget's Thesaurus
| Content Provider | Semantic Scholar |
|---|---|
| Author | Kennedy, Alistair |
| Copyright Year | 2006 |
| Abstract | Lexical resources are machine-readable dictionaries or lists of words, where semantic relationships between the terms are somehow expressed. These lexical resources have been used for many tasks such as word sense disambiguation and determining semantic similarity between terms. In recent years some research has been put into automatically building lexical resources from large corpora. In this thesis I examine methods of constructing a lexical resource, not from scratch, but rather expanding existing ones. Roget’s Thesaurus is a lexical resource that groups terms together based on degrees of semantic relatedness. One of Rogets Thesaurus’ weaknesses is that it does not specify the nature of the relationships between terms, it only indicates that there is a relationship. I attempt to label the relationships between terms in the thesaurus. These relationships could include: synonymy, hyponymy/hypernymy and meronymy/holonymy. I examine the Thesaurus for all of these relationships. Sources of these relationships include other lexical resources such as WordNet, and also large corpora and specialized texts such as dictionaries. Roget’s Thesaurus has other weaknesses including a somewhat outdated lexicon. Our version of Roget’s Thesaurus was created in 1987 and so does not contain words/phrases related to the Internet and other advances since 1987. I examine methods of creating a hypernym hierarchy of nouns. A hierarchy is constructed automatically and evaluated manually by several annotators who are fluent in English. These hypernyms are intended to be used in a system where a human annotator is given a set of hypernyms and indicates which are correct and which are incorrect. This is done to facilitate the process of constructing a lexical resource, a process which was previously done manually. I import over 50,000 hypernym relationships to Roget’s Thesaurus. An estimated overall accuracy of 73% is achieved across the entire hypernym set. As a final test the new relationships imported to the Thesaurus are used to improve Roget’s Thesaurus capacity of calculating semantic similarity between terms/phrases. The improved similarity function is tested on several applications that make use of semantic similarity. The relationships are also used to improve Roget’s Thesaurus’ capacity for solving SAT style analogy questions. |
| File Format | PDF HTM / HTML |
| DOI | 10.20381/ruor-18950 |
| Alternate Webpage(s) | http://www.cs.toronto.edu/~akennedy/publications/masters_thesis.pdf |
| Alternate Webpage(s) | https://doi.org/10.20381/ruor-18950 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |