Loading...
Please wait, while we are loading the content...
Similar Documents
An investigation of the conditions for effective data fusion in information retrieval
| Content Provider | Semantic Scholar |
|---|---|
| Author | Ng, Kwong Bor |
| Copyright Year | 1998 |
| Abstract | Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. There are many successful data fusion experiments reported in IR li terature, but there are also experiments in which data fusion did not work while using the same fusion rules. What is needed is a theory to tell a priori when one should use data fusion methods. We categorize different theoretical justifications of data fusion into two approaches, examine their implications, analyze some of the unsuccessful data fusion experiments, and propose two preconditions for effective data fusion: (1) The precondition of eff icacy and (2) The precondition of dissimilarity. We have developed a mathematical measure (Pair-out-of-order) to measure inter-scheme dissimilarity, and have developed algorithms and computer programs to implement our ideas. We report on a pilot test using the output lists of all IR schemes which participated in the Routing task of TREC 4 . Our result indicates that the eff icacy and inter-scheme dissimilarity are good predictors for effectiveness of data fusion. In addition, we find that model using the ratio of eff icacies of two schemes can improve our abili ty to predict fusion effectiveness. Preconditions of Effective Data Fusion p. 2 1. Information Retrieval System and Scheme The task of an information retrieval (IR) system is to select from a collection of information objects (e.g., documents) that may be of interest to a user. To facili tate effective searching, a representation describing various attributes of all the information objects in the collection will be generated by an IR system. When a user searches for documents, she herself, or through an intermediary, has to formulate her information need in a format prescribed by the IR system. This format can be simple free text, or some highly structured Boolean combination, or some other sophisticated expression, depending on the IR system. The IR system then matches the query with the document representations to estimate the relevancy of any document in the collection. After examining the output of the system, the user can adjust her query (e.g., by giving relevance feedback to the system) to invoke another matching process. The automatic process of IR, then, can be considered as comprising three basic components: 1. document representation; 2. query formulation; 3. computation for matching between document representation and query formulation When we fix the details of the representation, formulation, and the computation, we have defined an IR scheme. We use IR “system” and IR “scheme” to describe two different concepts (Kantor 1994a, Ng et al., 1997). IR system refers to the physical implementation of an IR algorithm, which can have various operational modes or various settings of parameters. Therefore, the same IR system may be used to execute different IR schemes by adjusting parameters (e.g., changing term weighting functions, or by switching from ranking retrieval mode to set retrieval mode, or by softening the Boolean operations, etc.) which will give rise to different outputs. Effective automation of the information retrieval task has long been an active area of Preconditions of Effective Data Fusion p. 3 research, leading to sophisticated retrieval models for representing the information content in documents and queries and computing the similarity between the two (Kantor, 1994b). With many various IR schemes available, researchers have begun to investigate the benefit of combining results of different IR schemes to improve performance. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.scils.rutgers.edu/~kbng/publications/asis98.04.ps |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |