Loading...
Please wait, while we are loading the content...
Similar Documents
Mining Maximally Informative k-Itemsets in Massively Distributed Environments
| Content Provider | Semantic Scholar |
|---|---|
| Author | Salah, Saber Akbarinia, Reza Masseglia, Florent |
| Copyright Year | 2016 |
| Abstract | The discovery of informative itemsets is a fundamental building block in data analytics and information retrieval. While the problem has been widely studied, only few solutions scale. This is particularly the case when i) the data set is massive, calling for large-scale distribution, and/or ii) the length k of the informative itemset to be discovered is high. In this paper, we address the problem of parallel mining of maximally informative k-itemsets (miki) based on joint entropy. We propose PHIKS (Parallel Highly Informative K-ItemSet) a highly scalable, parallel miki mining algorithm. PHIKS renders the mining process of large scale databases (up to terabytes of data) succinct and effective. Its mining process is made up of only two efficient parallel jobs. With PHIKS, we provide a set of significant optimizations for calculating the joint entropies of miki having different sizes, which drastically reduces the execution time of the mining process. PHIKS has been extensively evaluated using massive real-world data sets. Our experimental results confirm the effectiveness of our proposal by the significant scale-up obtained with high itemsets length and over very large databases. La decouverte d'itemsets informatifs est un element fondamen-tal dans l'analyse de donnes et la recherche d'information. Bien que le probleme a ete largement etudie, il y a peu de solutions qui passent a l'echelle. Ceci est particulierement le cas lorsque i) les donnees sont de tres grane taille, ce qui demande une distribution a grande echelle, et / ou ii) la longueur k des itemsets informatifs a decouvrir est elevee. Dans cet article, nous abordons le probleme de la fouille des k iems les plus informatifs (appele miki) qui est calcule en considerant l'entropie conjointe des items. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://hal-lirmm.ccsd.cnrs.fr/lirmm-01411190/file/bda_short_paper.pdf |
| Alternate Webpage(s) | https://hal-lirmm.ccsd.cnrs.fr/lirmm-01411190/document |
| Alternate Webpage(s) | https://hal-lirmm.ccsd.cnrs.fr/lirmm-01411190/file/bda_salah_short.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |