Loading...
Please wait, while we are loading the content...
UvA-DARE ( Digital Academic Repository ) Length normalization in XML retrieval
| Content Provider | Semantic Scholar |
|---|---|
| Author | Rijke, De Verbrugge, Rineke Taatgen Schomaker, Lambertus |
| Copyright Year | 2004 |
| Abstract | XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a potentially retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length bias introduced by the amount of smoothing, and show the importance of extreme length priors for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate document length normalization. Even after increasing the minimal size of XML elements occurring in the index, the importance of an extreme length bias remains. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://pure.uva.nl/ws/files/4036995/41029_file3885.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |