Loading...
Please wait, while we are loading the content...
Similar Documents
Ed-join: an Efficient Algorithm for Similarity Joins with Edit Distance Constraints
| Content Provider | Semantic Scholar |
|---|---|
| Author | Thuy, Thai Ngoc |
| Copyright Year | 2009 |
| Abstract | Similarity join is a fundamental operation in many application areas, such as data integration and cleaning, bioinformatics, and patte rn r cognition. In this project, we implement an efficient algorithm for similarity joi n with edit distance constraints. Current approaches are mainly that the edit distance constr ai t is converted to a weaker constraint on number of matching q-grams between pair of strin gs. In our project, we exploit a novel perspective of investigating mismatching q-gr am. We derive two new edit distance lower bounds by analyzing the locations and content s of mismatching q-grams. A new algorithm, Ed-Join, is proposed that exploits the n ew mismatch-based filtering methods; it achieves substantial reduction of the candidate siz s and hence saves computation time. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.nus.edu.sg/nurop/2010/Proceedings/SoC/NUROP_Congress_Thai_Ngoc_Thuy_Huong.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |