Loading...
Please wait, while we are loading the content...
Similar Documents
Cross-Language Source Code Re-Use Detection
| Content Provider | Semantic Scholar |
|---|---|
| Author | Flores, Enrique Barrón-Cedeño, Alberto Moreno, Lidia Rosso, Paolo |
| Copyright Year | 2014 |
| Abstract | Repositories, forums, or websites like Rosettacode.org make a vast amount of source codes available. With the growth of the Web, contents’ re-use has increased. Source code re-use detection allows to spot potential instances of re-use. In the recent years, source code re-use detection has been tackled mainly using compilers. When we deal with a crosslanguage source code re-use scenario, the detection is restricted to the languages supported by the compiler. Assuming a source code as a piece of text with its syntax and formal structure, we aim at applying models for text re-use detection to source code. In this paper we compare models which do not rely on external resources for measuring cross-language similarity —cross-language character n-grams, pseudo-cognateness, word count ratio—, against corpora-dependent models —cross-language explicit semantic analysis and cross-language alignment-based similarity analysis. In our experiments, a combination of cross-language character 3-grams and pseudo-cognateness performed better than the corporadependent models. The latter models improved their performance when exploiting larger corpora. All in all, the applied models showed to be able to retrieve both re-used and co-derived source codes. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://users.dsic.upv.es/~prosso/resources/FloresEtAl_CERI14long.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |