Loading...
Please wait, while we are loading the content...
Automatic cluster labeling using ontology and Latent Dirichlet Allocation ( LDA )
| Content Provider | Semantic Scholar |
|---|---|
| Author | Adhitama, Rifki Gernowo, Rahmat |
| Copyright Year | 2017 |
| Abstract | Latent Dirichlet Allocation (LDA) is a topic modeling method that provides the flexibility to organize, understand, search, and summarize electronic archives that have proven well implemented in text and information retrieval. The weakness of the LDA method is the inability to label the topics that have been formed. This research combines LDA with ontology scheme to overcome the weakness of labeling topic on LDA. This study uses datasets of 50 news documents taken from the online news portal. The ontology scheme used in this study is based on the dictionary of the field contained in Kamus Besar Bahasa Indonesia (KBBI). The experiment aims to find the best word count representation for each topic in order to produce the relevant label name for the topic. Cohen's kappa coefficient is used to measure the reliability of the label based on the agreement of two linguistic experts, while the mean relevance rate is used to measure the average of the relevant value of linguistic experts on a label with particular words representation that has more than 41% of the kappa value. The results of this study indicate the highest kappa value is in the five words representation of each topic with 100% value, while the highest mean relevance rate is in the 5 words and 30 words representation of each topic with 80% value. The average of kappa value is 61%, and the average value of mean Relevance rate is 71%. Keywords—text clustering; cluster labeling; latent dirichlet allocation; ontology |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://eprints.undip.ac.id/58015/1/IEEE_(english).pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |