Loading...
Please wait, while we are loading the content...
Similar Documents
Using Linear Algebra for Intelligent Information Retrieval Using Linear Algebra for Intelligent Information Retrieval Using Linear Algebra for Intelligent Information Retrieval
| Content Provider | Semantic Scholar |
|---|---|
| Author | Dumais, Susan T. Computer, G. W. O'Brien |
| Copyright Year | 1995 |
| Abstract | Currently, most approaches to retrieving textual materials from scientiic databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users' access to many kinds of textual materials, or to documents and services for which textual descriptions are available. A survey of the computational requirements for managing LSI-encoded databases as well as current and future applications of LSI is presented. with those of a query. However, lexical matching methods can be inaccurate when they are used to match a user's query. Since there are usually many ways to express a given concept (synonymy), the literal terms in a user's query may not match those of a relevant document. In addition, most words have multiple meanings (polysemy), so terms in a user's query will literally match terms in irrelevant documents. A better approach would allow users to retrieve information on the basis of a conceptual topic or meaning of a document. Latent Semantic Indexing (LSI) 4] tries to overcome the problems of lexical matching by using statistically derived conceptual indices instead of individual words for retrieval. LSI assumes that there is some underlying or latent structure in word usage that is partially obscured by variability in word choice. A truncated singular value decomposition (SVD) 14] is used to estimate the structure in word usage across documents. Retrieval is then performed using the database of singular values and vectors obtained from the truncated SVD. Performance data shows that these statistically derived vectors are more robust indicators of meaning than individual terms. A number of software tools have been developed to perform operations such as parsing document texts, creating a term by document matrix, computing the truncated … |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.csee.umbc.edu/~nicholas/courses/691d/papers/ut-cs-94-270.ps |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |