Loading...
Please wait, while we are loading the content...
Similar Documents
Robust and efficient locality sensitive hashing for nearest neighbor search in large data sets.
| Content Provider | CiteSeerX |
|---|---|
| Author | Kang, Byungkon Jung, Kyomin |
| Abstract | Locality sensitive hashing (LSH) has been used extensively as a basis for many data retrieval applications. However, previous approaches, such as random projection and multi-probe hashing, may exhibit high query complexity of up to Θ(n) when the underlying data distribution is highly skewed. This is due to the imbalance in the number of data stored per each bucket, which leads to slow query time in large data sets. In this paper, we introduce a distribution-free LSH algorithm that addresses this problem by maintaining nearly uniform number of points per bucket. As a consequence, our algorithm allows one to reduce the number of hash tables, and is hence memory-efficient, while achieving high accuracy. Through extensive experiments, we show that our algorithm accurately retrieves nearest neighbors faster than other standard LSH algorithms do in large data sets, and maintains nearly uniform number of per-bucket points. 1 |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Large Data Set Nearest Neighbor Search Efficient Locality Sensitive Hashing Uniform Number Random Projection Hash Table Multi-probe Hashing Distribution-free Lsh Algorithm High Query Complexity Many Data Retrieval Application Previous Approach Query Time High Accuracy Standard Lsh Algorithm Locality Sensitive Hashing Per-bucket Point Extensive Experiment Data Distribution |
| Content Type | Text |