Loading...
Please wait, while we are loading the content...
Similar Documents
Acquiring Paraphrases from Corpora and Its Application to Machine Translation
| Content Provider | Semantic Scholar |
|---|---|
| Author | Shimohata, Mitsuo |
| Copyright Year | 2004 |
| Abstract | A natural language contains various paraphrases, that is, superficially different expressions that share the same meaning. Such a wide variety of paraphrases reflects the rich expressiveness of natural language, while causing difficulty in natural language processing applications, such as machine translation (MT). For MT, this variety reduces the coverage of translatable input sentences and complicates language too much to comprehend every possible variation. Unfortunately, existing resources for paraphrases do not adequately deal with the difficulty because their paraphrase knowledge only covers general areas and has little effect on uses for specific domains and applications. This thesis describes corpus-based paraphrase acquisition and its application to MT. We propose two paraphrase acquisition methods: lexical paraphrases and sentential paraphrases, each of which has its own advantages. Both methods are based on shallow analysis, and rely on a corpus but no other resource. The achievements described in this thesis consist of three parts: analysis of manual paraphrases, automatic acquisition of lexical paraphrases, and similar sentence retrieval, which corresponds to sentential paraphrasing. First, we describe two analyses of human paraphrases to clarify the following questions: (1) what types of paraphrases are dominant? and (2) how can human paraphrases be effective for MT? These investigations suggest that lexical paraphrasing and sentential paraphrasing are dominant in travel conversation domains. Second, we describe a method for extracting lexical paraphrases from a parallel corpus. This method has two advantages: (1) it acquires not only synonymous content ∗Doctoral Dissertation, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-DD0261014, September 15, 2004. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://cl.aist-nara.ac.jp/thesis/dthesis-shimohata.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |