NDLI: Quantum haystacks

Please wait, while we are loading the content...

Learning user interaction models for predicting web search result preferences

Contextual search and name disambiguation in email using graphs

Spoken document retrieval from call-center conversations

AggregateRank: bringing order to web sites

The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine

On-line spam filter fusion

Using web-graph distance for relevance feedback in web search

Context-sensitive semantic smoothing for the language modeling approach to genomic IR

A study of statistical models for query translation: finding a good unit of translation

Probabilistic model for definitional question answering

Latent semantic analysis for multiple-type interrelated data objects

Evaluation in (XML) information retrieval: expected precision-recall with user modelling (EPRUM)

Finding near-duplicate web pages: a large-scale evaluation of algorithms

Capturing collection size for distributed non-cooperative retrieval

Load balancing for term-distributed parallel retrieval

Mining dependency relations for query expansion in passage retrieval

Document clustering with prior knowledge

Less is more: probabilistic models for retrieving fewer relevant documents

Elicitation of term relevance feedback: an investigation of term source and context

Large scale semi-supervised linear SVMs

Unifying user-based and item-based collaborative filtering approaches by similarity fusion

Evaluating evaluation metrics based on the bootstrap

Learning to advertise

A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization

Clustering of search results using temporal attributes

The TIJAH XML information retrieval system

Quantum haystacks

User performance versus precision measures for simple search tasks

Thread detection in dynamic text message streams

Towards efficient automated singer identification in large music databases

Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

A parallel derivation of probabilistic information retrieval models

Building bridges for web query classification

Improving the estimation of relevance models using large external corpora

LDA-based document models for ad-hoc retrieval

Combining bidirectional translation and synonymy for cross-language information retrieval

Answering complex questions with random walk models

Identifying comparative sentences in text documents

Minimal test collections for retrieval evaluation

Structure-driven crawler generation by example

Probabilistic latent query analysis for combining multiple retrieval sources

Hybrid index maintenance for growing text collections

What makes a query difficult?

Text clustering with extended user feedback

High accuracy retrieval with multiple nested ranker

Find-similar: similarity browsing as a search tool

Graph-based text classification: learn from your neighbors

Personalized recommendation driven by information flow

Statistical precision of information retrieval evaluation

Getting work done on the web: supporting transactional queries

Information graphics: an untapped resource for digital libraries

A complex document information processing prototype

A location annotation system for personal photos

Social networks, incentives, and search

Improving web search ranking by incorporating user behavior information

Formal models for expert finding in enterprise corpora

Music structure based vector space retrieval

Topical link analysis for web search

Semantic term matching in axiomatic approaches to information retrieval

ProbFuse: a probabilistic approach to data fusion

Regularized estimation of mixture models for robust pseudo-relevance feedback

Adapting ranking SVM to document retrieval

A framework to predict the quality of answers with non-textual features

Tackling concept drift by temporal inductive transfer

Dynamic test collections: measuring search effectiveness on the live web

Building implicit links from content for forum search

User modeling for full-text federated search in peer-to-peer networks

Type less, find more: fast autocompletion search with a succinct index

On ranking the effectiveness of searches

Near-duplicate detection by instance-level constrained clustering

Semantic search via XML fragments: a high-precision approach to IR

Exploring the limits of single-iteration clarification dialogs

Constructing informative prior distributions from domain knowledge in text classification

Analysis of a low-dimensional linear model under recommendation attacks

A statistical method for system evaluation using incomplete judgments

You are what you say: privacy risks of public mentions

News to go: hierarchical text summarization for mobile devices

Inferring document relevance via average precision

Appraisal navigator

Information retrieval at Boeing: plans and successes

Generalizing PageRank: damping functions for link-based ranking algorithms

Distributed query sampling: a quality-conscious approach

Pruned query evaluation using pre-computed impacts

Automatic construction of known-item finding test beds

A platform for Okapi-based contextual information retrieval

Adaptive query-based sampling for distributed IR

Project contexts to situate personal information

PENG: integrated search of distributed news archives

Cheshire3: retrieving from tera-scale grid-based digital libraries

Examining assessor attributes at HARD 2005

DeWild: a tool for searching the web using wild cards

User expectations from XML element retrieval

Searching for expertise using the terrier platform

Theoretical benchmarks of XML retrieval

DiLight: an ontology-based information access system for e-learning environments

Question classification with log-linear models

Supporting semantic visual feature browsing in contentbased video retrieval

Community-based snippet-indexes for pseudo-anonymous personalization in web search

MathFind: a math-aware search engine

Bias and the limits of pooling

Term proximity scoring for ad-hoc retrieval on very large text collections

An exploratory web log study of multitasking

Tensor space model for document analysis

First large-scale information retrieval experiments on turkish texts

Learning a ranking from pairwise preferences

Automated performance assessment in interactive QA

Stylistic text segmentation

On hierarchical web catalog integration with conceptual relationships in thesaurus

Rpref: a generalization of Bpref towards graded relevance judgments

A new web page summarization method

NMF and PLSI: equivalence and a hybrid algorithm

Using historical data to enhance rank aggregation

Enterprise search behaviour of software engineers

Evaluating sources of query expansion terms

Comparing two blind relevance feedback techniques

Information retrieval with commonsense knowledge

Refining hierarchical taxonomy structure via semi-supervised learning

Quantative analysis of the impact of judging inconsistency on the performance of relevance feedback

Swordfish: an unsupervised Ngram based approach to morphological analysis

Authorship attribution with thousands of candidate authors

Simple questions to improve pseudo-relevance feedback results

Is XML retrieval meaningful to users?: searcher preferences for full documents vs. elements

Building a test collection for complex document information processing

Enhancing topic tracking with temporal information

A comparative study of the effect of search feature design on user experience in digital libraries (DLs)

Representing clusters for retrieval

One-sided measures for evaluating ranked retrieval effectiveness with spontaneous conversational speech

Combining fields in known-item email search

Improving QA retrieval using document priors

Content-based video retrieval: does video's semantic visual feature matter?

Action modeling: language models that predict query behavior

A method of rating the credibility of news documents on the web

An analysis of the coupling between training set and neighborhood sizes for the kNN classifier

Fact-focused novelty detection: a feasibility study

Unity: relevance feedback using user query logs

Improving personalized web search using result diversification

Using small XML elements to support relevance

Give me just one highly relevant document: P-measure

Feature diversity in cluster ensembles for robust document clustering

Lightening the load of document smoothing for better language modeling retrieval

The effect of OCR errors on stylistic text classification

History repeats itself: repeat queries in Yahoo's logs

Early precision measures: implications from the downside of blind feedback

An experimental study on automatically labeling hierarchical clusters using statistical features

Strict and vague interpretation of XML-retrieval queries

Why structural hints in queries do not help XML-retrieval

Searching the web using composed pages

A study of real-time query expansion effectiveness

A graph-based framework for relation propagation and its application to multi-label learning

Measuring similarity of semi-structured documents with context weights

Incorporating query difference for learning retrieval functions in information retrieval

Concept-based biomedical text retrieval

Quantum haystacks

Content Provider	ACM Digital Library
Author	van Rijsbergen, C. J. 'Keith'
Abstract	This acceptance talk is a curious mixture of personal history and developing ideas in the context of the growing field of IR covering several decades. I want to concentrate on models and theories, interpreted loosely, and try and give an insight into where I have got to in my thinking, where the ideas came from, and where I believe we are going.In the last few years I have been working on the development of what might be coined as a design language for IR. It takes its inspiration from Quantum Mechanics, but by analogy only. The mathematical objects represent documents; these objects might be vectors (or density operators) in an n-dimensional vector space (usually a Hilbert space). A request for information, or a query, is taken as an observable and is represented as a linear operator on the space. Linear operators can be expressed as matrices. Such an operator, Hermitian, has a set of eigenvectors forming a basis for the space; which we interpret as a point of view or perspective from which to understand the space. Thus any document-vector can be located with respect to the basis, and we can calculate an inner product between such a vector and any basis vector, which may be interpreted as a probability of relevance. The probability of observing any given eigenvector is now given by the square of that inner product assuming all vectors are normalised. Hence we connect the probability of observation to the geometry of the space. Furthermore, the subspaces of the space make up a lattice structure which is equivalent to a logic. This makes up the entire mathematical structure, and the language for handling this structure is linear algebra: vectors, matrices, projections, inner-products, neatly captured by the Dirac notation used in quantum mechanics. Our probability is slightly different from classical probability, the same for logic; we end up with quantum logic and quantum probability.A commitment to this kind of mathematical structure, with which to model objects and processes in IR, depends on two critical assumptions.The distances in the space between objects are a source of important relationships with respect to relevance and aboutness.The observation of a property such as relevance or aboutness is user dependent in the sense that a potential interaction is specified by a user through an operator which when measured achieves outcomes with a probability determined by the geometry of the space.The geometry of this mathematical structure and the probability defined on it are closely connected by the following theorem due to Gleason (1957). One may summarise this theorem by saying that the probability of a subspace is given by a simple algorithm derived from a projection onto the subspace and a special kind of operator, namely a statistical operator, or density matrix. And conversely, that given a probability measure on the subspaces then we can encode that measure uniquely through such an algorithm. This is a very powerful theorem and its consequences remain to be explored.So how did I get to this point and form of abstraction? Most of my research work can be divided into contributions to the following areas:ClusteringEvaluationProbabilistic ModelsLogic ModelsGeometry.In all these areas I have attempted to search for underlying mathematical structures that would lead to computations. These topics have in common that they depend on the construction of measures on a space which in some sense determines the usefulness or effectiveness of the structure. For clustering one considers mapping from metric spaces to ultrametic spaces and measure the closeness of fit. In the case of evaluation, one starts with a relational conjoint structure and imposes some constraints given by what is to be measured, one then constructs a numerical representation of this structure leading to such measures as F (or E). For probabilistic models the main difficulty is concerned with deciding on an appropriate event space on which to define the 'right' probability measures. For me the most significant example in this context was the attempt to construct a Logical Uncertainty Principle which formulated a measure of uncertainty on incomplete logical constructs. This attempt left unspecified the exact form of the measure. In the Geometry of IR I finally managed to formulate that measure as a projection-valued measure.This way of thinking did not appear out of nowhere. It was heavily influenced by the work of Fairthorne(1961) whose work on Brouwerian Logic (an Intuitionistic Logic) was picked up by Salton in his early book on IR. At an earlier stage MacKay (1950) wrote a paper that opened with, 'This paper relates to the borderline linking experimental and theoretical physics with mathematical logic, and covers at several points ground which is common to the theory of communication.' He goes on to define an 'information-operator' which is very similar in scope and intent to the Hermitian operator above. Maron, who collaborated with MacKay, stated in his 1965 paper, 'Therefore, it can be argued that index descriptions should not be viewed as properties of documents: They function to relate documents and users.' One can see that the development of these early ideas was continued to the construction of the Geometry of IR.What does it leave to be done? An attempt should be made to use this design language to build an IR system. On the theoretical front it is worth considering whether it would be better to start with a transition probability space rather than a Hilbert space as Von Neumann did in 1937 (translated in 1981). The assumption that closed linear subspaces will be the elements of our logic can be challenged, as perhaps a construction with different elements is possible. It is not obvious what the best form of conditional probability might be in these spaces. Agreeing on a form of conditionalisation is intimately tied up with how to model contextuality. There is some evidence to suggest that contextuality plays a role in modelling the conjuncton of concepts (Widdows, 2004). Such contexts have been modelled in quantum theory almost from the beginning, for example, Gleason's theorem precludes noncontextual hidden variable theories.
Starting Page	1
Ending Page	2
Page Count	2
File Format	PDF
ISBN	1595933697
DOI	10.1145/1148170.1148171
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2006-08-06
Publisher Place	New York
Access Restriction	Subscribed
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in