Loading...
Please wait, while we are loading the content...
Similar Documents
A new randomized algorithm for document.
| Content Provider | CiteSeerX |
|---|---|
| Author | Ganguly, Sumit |
| Abstract | Abstract — In today’s world, copy detection is a major problem. Students plagiarize assignments from the web and from each other. In such a scenario, we need a technique that can detect even partial copies between assignments subject to relocation. This problem also finds uses in the context of the web. Search engines are highly interested in detecting copies of entire web documents to avoid displaying the same content multiple times in the result. Document fingerprinting is an efficient technique for the accurate detection of full and partial copies between documents. We have come up with a new randomized algorithm that provides a guarantee that with very high probability, any match of greater than or equal to W characters (an input parameter) will be detected. Moreover, we have small deterministic bounds on the amount of space needed for our algorithm. This is the key way in which it differs from previous work, where either there are no guarantees [1] or the space bounds are very poor [2]. I. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Previous Work Content Multiple Time Today World Key Way Entire Web Document Small Deterministic Bound Copy Detection Document Fingerprinting Accurate Detection Input Parameter Efficient Technique Space Bound Search Engine High Probability Partial Copy Major Problem |
| Content Type | Text |