Loading...
Please wait, while we are loading the content...
Similar Documents
Approximate lineage for probabilistic databases (2008)
| Content Provider | CiteSeerX |
|---|---|
| Author | Suciu, Dan Ré, Christopher |
| Abstract | In probabilistic databases, lineage is fundamental to both query processing and understanding the data. Current systems s.a. Trio or Mystiq use a complete approach in which the lineage for a tuple t is a Boolean formula which represents all derivations of t. In large databases lineage formulas can become huge: in one public database (the Gene Ontology) we often observed 10MB of lineage (provenance) data for a single tuple. In this paper we propose to use approximate lineage, which is a much smaller formula keeping track of only the most important derivations, which the system can use to process queries and provide explanations. We discuss in detail two specific kinds of approximate lineage: (1) a conservative approximation called sufficient lineage that records the most important derivations for each tuple, and (2) polynomial lineage, which is more aggressive and can provide higher compression ratios, and which is based on Fourier approximations of Boolean expressions. In this paper we define approximate lineage formally, describe algorithms to compute approximate lineage and prove formally their error bounds, and validate our approach experimentally on a real data set. 1. |
| File Format | |
| Publisher Date | 2008-01-01 |
| Access Restriction | Open |
| Subject Keyword | Specific Kind Complete Approach Boolean Expression Important Derivation Current System Probabilistic Database Large Database Sufficient Lineage Error Bound Real Data Set Conservative Approximation Compression Ratio Approximate Lineage Single Tuple Fourier Approximation Polynomial Lineage Query Processing Public Database Gene Ontology Boolean Formula |
| Content Type | Text |
| Resource Type | Article |