Loading...
Please wait, while we are loading the content...
Similar Documents
Recherche d'une représentation des données efficace pour la fouille des grandes bases de données
| Content Provider | Semantic Scholar |
|---|---|
| Author | Boullé, Marc |
| Copyright Year | 2007 |
| Abstract | The data preparation step of the data mining process represents 80% of the problem and is both time consuming and critical for the quality of the modeling. In this thesis, our purpose is to design an evaluation criterion of data representations, in order to automate data preparation. To overcome this problem, we introduce a non parametric family of density estimation models, named data grid models. Each variable is partitioned in intervals or in groups of values according to whether it is numerical of categorical, and the whole data space is partitioned into a grid of cells resulting from the cross-product of the univariate partitions. We then consider density estimation models where the density is assumed constant per data grid cell. Because of their high expressiveness, data grid models are hard to regularize and to optimize. We exploit a model selection technique based on a Bayesian approach and obtain an exact analytic criterion for the posterior probability of data grid models. We introduce combinatorial optimization algorithms which leverage the properties of our evaluation criterion and the sparseness of data in large dimension. These algorithms have a guaranteed algorithmic complexity, which is super-linear with the sample size. We evaluate data grid models in numerous tasks of data analysis, for supervised classification, regression, clustering or coclustering. The results demonstrate the validity of the approach, that allows to automatically and efficiently detect fine-grained and reliable information useful for the data preparation step. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://clopinet.com/isabelle/Projects/reading/SoutenanceTheseBoulle2007.pdf |
| Alternate Webpage(s) | https://pastel.archives-ouvertes.fr/pastel-00003023/document |
| Alternate Webpage(s) | http://www.marc-boulle.fr/publications/BoulleThesis07.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |