Loading...
Please wait, while we are loading the content...
Similar Documents
An Extensible Framework for Data Cleaning
| Content Provider | Hyper Articles en Ligne (HAL) |
|---|---|
| Author | Galhardas, Helena Florescu, Daniela Shasha, Dennis Simon, Eric |
| Abstract | Data integration solutions dealing with large amounts of data have been strongly required in the last few years. Besides the traditional data integration problems (e.g. schema integration, local to global schema mappings), three additional data problems have to be dealt with: (1) the absence of universal keys across different databases that is known as the object identity problem, (2) the existence of keyborad errors in the data, and (3) the presence of inconsistencies in data coming from multiple sources. Dealing with these problems is globally called the data cleaning process. In this work, we propose a framework which offers the fundamental services required by this process: data transformation, duplicate elimination and multi-table matching. These services are implemented using a set of purposely designed macro-operators. Moreover, we propose an SQL extension for specifying each of the macro-operators. One important feature of the framework is the ability of explicitly including the human interaction in the process. The main novelty of the work is that the framework permits the following performance optimizations which are tailored for data cleaning applications: mixed evaluation, neighborhood hash join, decision push-down and short-circuited computation. We measure the benefits of each. |
| File Format | |
| Language | English |
| Publisher Date | 1999-01-01 |
| Publisher Institution | INRIA |
| Access Restriction | Open |
| Subject Keyword | APPROXIMATE JOIN QUERY LANGUAGE DATA TRANSFORMATION QUERY OPTIMIZATION DATA INTEGRATION DATA CLEANING info Computer Science [cs] Other [cs.OH] |
| Content Type | Text |
| Resource Type | Article |