Loading...
Please wait, while we are loading the content...
Similar Documents
Query Based Topic Modeling: An Information-Theoretic Framework for Semantic Analysis in Large-Scale Collections
| Content Provider | Semantic Scholar |
|---|---|
| Author | Ramírez, Eduardo H. Brena, Ramón F. |
| Copyright Year | 2012 |
| Abstract | Creating topic models of text collections is an important step towards more adaptive information access and retrieval applications. Such models encode knowledge of the topics discussed on a collection, the documents that belong to each topic and the semantic similarity of a given pair of topics. So far, the dominant paradigm to topic modeling has been the Probabilistic Topic Modeling approach in which topics are represented as probability distributions of terms. Although such models are theoretically sound, their high computational complexity makes them difficult to use in very large-scale collections. In this work the authors propose an alternative collection-modeling paradigm based on a simpler representation of topics as freely overlapping clusters of semantically similar documents, thus being able to take advantage of highly-scalable clustering algorithms. Then, the authors propose the Query-based Topic Modeling framework (QTM), an information-theoretic method that assumes the existence of a “golden” set of queries that can capture most of the semantic information of the collection and produce models with maximum semantic coherence. The QTM method uses information-theoretic heuristics to find a set of “topical-queries” which are then co-clustered along with the documents of the collection and transformed to produce overlapping document clusters. The QTM framework was designed with scalability in mind and is able to be executed in parallel over commodity-class machines using the Map-Reduce approach. |
| Starting Page | 69 |
| Ending Page | 95 |
| Page Count | 27 |
| File Format | PDF HTM / HTML |
| DOI | 10.4018/978-1-60960-881-1.ch004 |
| Alternate Webpage(s) | https://www.igi-global.com/viewtitlesample.aspx?id=60116&ptid=51342&t=query+based+topic+modeling:+an+information-theoretic+framework+for+semantic+analysis+in+large-scale+collections |
| Alternate Webpage(s) | https://doi.org/10.4018/978-1-60960-881-1.ch004 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |