Loading...
Please wait, while we are loading the content...
Query Expansion for Multi-script Information Retrieval
| Content Provider | Semantic Scholar |
|---|---|
| Author | Gupta, Parth Choudhury, Monojit Rosso, Paolo |
| Copyright Year | 2014 |
| Abstract | For many languages that use non-Roman based indigenous scripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or multi-lingual space with more than one script which we refer to as the Multi-Script space. IR in the multi-script space is challenging because queries written in either the native or the Roman script need to be matched to the documents written in both the scripts. Moreover, transliterated content features extensive spelling variations. In this paper, for the first time, we formally introduce the concept of Multi-Script IR, and through analysis of the query logs of Bing search engine, estimate the prevalence and thereby establish the importance of this problem. We also give a principled solution to handle the multi-script term matching and spelling variation where the terms across the scripts are modelled jointly in a deep-learning architecture and can be compared in a low-dimensional abstract space. We present an extensive empirical analysis of the proposed method along with the evaluation results in an ad-hoc retrieval setting of multi-script IR where the proposed method achieves significantly better results (12% increase in MRR and 29% increase in MAP) compared to other state-of-theart baselines. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://www.rbanchs.com/documents/translit-main-camera-ready.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |