Loading...
Please wait, while we are loading the content...
Similar Documents
Extracting Semantic Information from Corpora Using Dependency Relations
| Content Provider | Semantic Scholar |
|---|---|
| Author | Padó, Sebastian |
| Copyright Year | 2002 |
| Abstract | Semantic space models are a representation formalism for lexical semantics which represents words as vectors in a high-dimensional space. The distances between word vectors can be interpreted as a measure for the semantic similarity of the words. Since semantic space models are constructed automatically from corpora, the semantic representations of words they provide are empirically grounded. This has made them popular in psychology for modelling behavioural data, for example priming, and in Information Retrieval, where richer word representations can be helpful. However, the “bag of words”-style word co-occurrence statistics that are traditionally employed for the construction of semantic space models are deficient from the standpoint of theoretical linguistics. This thesis examines the resulting shortcomings of semantic space models and suggests to replace the representation of words in terms of their word context by the representation of words in terms of their syntactic context. The new, highly parametrisable construction framework, which is based on dependency grammar, is implemented in form of the DEPENDENCYVECTORS system. Models produced with the DEPENDENCYVECTORS system are evaluated against traditional models in three tasks. The first task, which tests the new models’ cognitive adequacy, shows that DEPENDENCYVECTORS models capture direct, but not mediated priming. The best existing models have been reported to capture both phenomena. The second task, which investigates the encoding of different lexical relations, shows that DEPENDENCYVECTORS models can capture differences between different lexical relations, which traditional models cannot. The last task, the TOEFL synonymy task, shows that the performance of DEPENDENCYVECTORS models in synonymy identification is roughly on par with the best existing traditional models. In summary, DEPENDENCYVECTORS models cannot yet model all behavourial data which traditional models can, but this may be possible once the right set of parameters has been identified. However, the main insight is that the ability to distinguish between different lexical relations allows the application of DEPENDENCYVECTORS models for linguistic tasks which require access to structured knowledge – for example, query extension or synonymy identification. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.coli.uni-sb.de/~pado/pub/msc/thesis.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |