NDLI: Translate Once, Translate Twice, Translate Thrice and Attribute: Identifying Authors and Machine Translation Tools in Translated Text

Content Provider	IEEE Xplore Digital Library
Author	Caliskan, A. Greenstadt, R.
Copyright Year	2012
Abstract	In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.
Starting Page	121
Ending Page	125
File Size	289045
Page Count	5
File Format	PDF
ISBN	9781467344333
DOI	10.1109/ICSC.2012.46
Language	English
Publisher	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher Date	2012-09-19
Publisher Place	Italy
Access Restriction	Subscribed
Rights Holder	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subject Keyword	Computer science Privacy Google authorship attribution Accuracy Semantics Writing machine translation Feature extraction privacy anonymity machine learning
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

translation tools

Authorship attribution of web forum posts

On Identifying Authors with Style

Cross-Language Authorship Attribution

Authorship Identification for Online Text

Detecting Hoaxes, Frauds, and Deception in Writing Style Online

Exploring Google Translate-friendly strategies for optimizing the quality of Google Translate in academic writing contexts.

An Empirical Accuracy Law for Sequential Machine Translation: the Case of Google Translate

Neural Machine Translation: Fine-Grained Evaluation of Google Translate Output for English-to-Arabic Translation

Translate Once, Translate Twice, Translate Thrice and Attribute: Identifying Authors and Machine Translation Tools in Translated Text

Similar Documents

translation tools

Authorship attribution of web forum posts

On Identifying Authors with Style

Cross-Language Authorship Attribution

Authorship Identification for Online Text

Detecting Hoaxes, Frauds, and Deception in Writing Style Online

Exploring Google Translate-friendly strategies for optimizing the quality of Google Translate in academic writing contexts.

An Empirical Accuracy Law for Sequential Machine Translation: the Case of Google Translate

Neural Machine Translation: Fine-Grained Evaluation of Google Translate Output for English-to-Arabic Translation

Translate Once, Translate Twice, Translate Thrice and Attribute: Identifying Authors and Machine Translation Tools in Translated Text