NDLI: Mining English-Chinese Named Entity Pairs from Comparable Corpora

Please wait, while we are loading the content...

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 13

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 12

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 11

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 10

Issue 4, December 2011

Improved Chinese--English SMT with Chinese “DE” Construction Classification and Reordering

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

Mining English-Chinese Named Entity Pairs from Comparable Corpora

User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective

Deep Learning Approaches to Semantic Relevance Modeling for Chinese Question-Answer Pairs

Issue 3, September 2011

Issue 2, June 2011

Issue 1, March 2011

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 9

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 8

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 7

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 6

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 5

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 4

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 3

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 2

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 1

Mining English-Chinese Named Entity Pairs from Comparable Corpora

Content Provider	ACM Digital Library
Author	Wang, Peng Li, Lishuang Huang, Degen Zhao, Lian
Copyright Year	2011
Abstract	Bilingual Named Entity (NE) pairs are valuable resources for many NLP applications. Since comparable corpora are more accessible, abundant and up-to-date, recent researches have concentrated on mining bilingual lexicons using comparable corpora. Leveraging comparable corpora, this research presents a novel approach to mining English-Chinese NE translations by combining multi-dimension features from various information sources for every possible NE pair, which include the transliteration model, English-Chinese matching, Chinese-English matching, translation model, length, and context vector. These features are integrated into one model with linear combination and minimum sample risk (MSR) algorithm. As for the high type-dependence of NE translation, we integrate different features according to different NE types. We experiment with the above individual feature or integrated features to mine person NE (PN) pairs, location NE (LN) pairs and organization NE (ON) pairs. When using transliteration and length to mine PN pairs, we achieve the best performance of 84.9% $(\textit{F}-score).$ The LN pairs can be mined with the features of transliteration model, length, translation model, English-Chinese matching and Chinese-English matching. And the best performance is 83.4% $(\textit{F}-score).$ The ON pairs can be mined with the features of English-Chinese matching and Chinese-English matching. It reaches the best performance with 84.1% $(\textit{F}-score).$
Starting Page	1
Ending Page	19
Page Count	19
File Format	PDF
ISSN	15300226
e-ISSN	15583430
DOI	10.1145/2025384.2025387
Volume Number	10
Issue Number	4
Journal	ACM Transactions on Asian Language Information Processing (TALIP)
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2011-12-01
Publisher Place	New York
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Chinese-English matching English-Chinese matching MSR Transliteration model Comparable corpora Mining Named entity Pairs Translation model
Content Type	Text
Resource Type	Article
Subject	Computer Science

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Mining named entity transliteration equivalents from comparable corpora

Some experiments in mining named entity transliteration pairs from comparable corpora.

Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora

Mining named entity transliteration equivalents from comparable corpora

Named Entity Transliteration with Comparable Corpora (2006)

Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

Generating Chinese named entity data from parallel corpora

Named entity transliteration with comparable corpora (2006)

MINT: A method for effective and scalable mining of named entity transliterations from large comparable corpora

Mining English-Chinese Named Entity Pairs from Comparable Corpora

Similar Documents

Mining named entity transliteration equivalents from comparable corpora

Some experiments in mining named entity transliteration pairs from comparable corpora.

Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora

Mining named entity transliteration equivalents from comparable corpora

Named Entity Transliteration with Comparable Corpora (2006)

Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

Generating Chinese named entity data from parallel corpora

Named entity transliteration with comparable corpora (2006)

MINT: A method for effective and scalable mining of named entity transliterations from large comparable corpora

Mining English-Chinese Named Entity Pairs from Comparable Corpora