NDLI: Web text retrieval with a P2P query-driven index

Please wait, while we are loading the content...

Personalized query expansion for the web

Utility-based information distillation over temporally sequenced documents

Robust test collections for retrieval evaluation

An interactive algorithm for asking and incorporating feature feedback into support vector machines

Towards automatic extraction of event and place semantics from flickr tags

Fast generation of result snippets in web search

Information re-retrieval: repeat queries in Yahoo's logs

Efficient document retrieval in main memory

Topic segmentation with shared topic detection and alignment of multiple documents

Multiple-signal duplicate detection for search evaluation

Supporting multiple information-seeking strategies in a single system framework

A support vector method for optimizing average precision

An exploration of proximity measures in information retrieval

Deconstructing nuggets: the stability and reliability of complex question answering evaluation

On the robustness of relevance measures with incomplete judgments

FRank: a ranking method with fidelity loss

Relaxed online SVMs for spam filtering

Towards musical query-by-semantic-description using the CAL500 data set

Building simulated queries for known-item topics: an analysis using six european languages

Hits on the web: how does it compare?

Federated text retrieval from uncooperative overlapped collections

A time machine for text search

Query performance prediction in web search environments

How well does result relevance predict session satisfaction?

An outranking approach for rank aggregation in information retrieval

Vocabulary independent spoken term detection

Context sensitive stemming for web search

Heavy-tailed distributions and multi-keyword queries

Using gradient descent to optimize language modeling smoothing parameters

MQX: multi-query engine for compressed XML data

Beyond classical measures: how to evaluate the effectiveness of interactive information retrieval system?

Strategy follows technology

Using query contexts in information retrieval

Effective missing data prediction for collaborative filtering

Reliable information retrieval evaluation with incomplete and biased judgements

Learn from web search logs to organize search results

Hierarchical classification for automatic image annotation

The influence of caption features on clickthrough patterns in web search

Studying the use of popular destinations to enhance web search interaction

The impact of caching on search engines

Analyzing feature trajectories for event detection

Robust classification of rare queries using web knowledge

Investigating the querying and browsing behavior of advanced search engine users

Ranking with multiple hyperplanes

Estimation and use of uncertainty in pseudo-relevance feedback

Interesting nuggets and their impact on definitional question answering

Test theory for assessing IR test collections

AdaRank: a boosting algorithm for information retrieval

Know your neighbors: web spam detection using the web topology

A music search engine built upon audio-based and web-based similarity measures

Cross-lingual query suggestion using query logs of different languages

Hits hits TREC: exploring IR evaluation results with network analysis

Evaluating sampling methods for uncooperative collections

Principles of hash-based text retrieval

Broad expertise retrieval in sparse data environments

A new approach for evaluating query expansion: query-document term mismatch

Enhancing relevance scoring with chronological term rank

Improving text classification for oral history archives with temporal domain knowledge

Detecting, categorizing and clustering entity mentions in Chinese text

ESTER: efficient search on text, entities, and relations

Locality discriminating indexing for document classification

ISKODOR: unified user modeling for integrated searching

People search in the enterprise

2007 Athena Lecturer Award introduction

Towards task-based personal information management evaluations

Efficient bayesian hierarchical user modeling for recommendation system

Alternatives to Bpref

Regularized clustering for documents

Laplacian optimal design for image retrieval

CollabSum: exploiting multiple document clustering for collaborative single document summarizations

Neighborhood restrictions in geographic IR

Pruning policies for two-tiered inverted index with correctness guarantee

New event detection based on indexing-tree and named entity

Random walks on the click graph

Term feedback for information retrieval with language models

A regression framework for learning ranking functions using relative relevance judgments

Latent concept expansion using markov random fields

A probabilistic graphical model for joint answer ranking in question answering

Strategic system comparisons via targeted relevance judgments

A combined component approach for finding collection-adapted ranking functions based on genetic programming

DiffusionRank: a possible penicillin for web spamming

Combining content and link for classification using matrix factorization

Updating collection representations for federated search

Compressed permuterm index

A semantic approach to contextual advertising

Performance prediction using spatial autocorrelation

ARSA: a sentiment-aware model for predicting sales performance using blogs

Indexing confusion networks for morph-based spoken document retrieval

Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature

Web text retrieval with a P2P query-driven index

Management of keyword variation with frequency based generation of word forms in IR

Babel: a machine transliteration workbench

Efficient integration of proximity for text, semi-structured and graph retrieval

Natural language and the information layer

A study of Poisson query generation model for information retrieval

Structured retrieval for question answering

Feature selection for ranking

OMES: a new evaluation strategy using optimal matching for document clustering

X-Site: a workplace search tool for software engineers

Attention-based information retrieval

Revisiting the dependence language model for information retrieval

The wild thing goes local

A summarisation logic for structured documents

Quantify query ambiguity using ODP metadata

DiscoverInfo: a tool for discovering information with relevance and novelty

Information-behaviour modeling with external cues

Combining error-correcting output codes and model-refinement for text categorization

Radio Oranje: searching the queen's speech(es)

Fuzzy temporal and spatial reasoning for intelligent information retrieval

User-oriented text segmentation evaluation measure

Mobile interface of the memoria project

Paragraph retrieval for why-question answering

Story segmentation of broadcast news in Arabic, Chinese and English using multi-window features

A full-text retrieval toolkit for mobile desktop search

Global resources for peer-to-peer text retrieval

Recommending citations for academic papers

EXPOSE: searching the web for expertise

Automatic query-time generation of retrieval expert coefficients for multimedia retrieval

Exploration of the tradeoff between effectiveness and efficiency for results merging in federated search

Text categorization for streams

Understanding the relationship of information need specificity to search query length

Search results using timeline visualizations

An effective snippet generation method using the pseudo relevance feedback technique

Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search

Probability ranking principle via optimal expected rank

Nexus: a real time QA system

Combining term-based and event-based matching for question answering

Geographic ranking for a local search engine

Confluence: enhancing contextual desktop search

Focused ranking in a vertical search engine

Estimating the value of automatic disambiguation

A "do-it-yourself" evaluation service for music information retrieval systems

A generic framework for machine transliteration

IR-Toolbox: an experiential learning tool for teaching IR

Where to start reading a textual XML document?

Novelty detection using local context analysis

Intra-assessor consistency in question answering

Towards robust query expansion: model selection in the language modeling framework

Automatic classification of web pages into bookmark categories

What emotions do news articles trigger in their readers?

Evaluating discourse-based answer extraction for why-question answering

Topic segmentation using weighted lexical links (WLL)

Lexical analysis for modeling web query reformulation

Bridging the digital divide: understanding information access practices in an indian village community

BordaConsensus: a new consensus function for soft cluster ensembles

A flexible retrieval system of shapes in binary images

Semantic text classification of disease reporting

Evaluating relevant in context: document retrieval with a twist

IDF revisited: a simple new derivation within the Robertson-Spärck Jones probabilistic model

Validity and power of t-test for comparing MAP and GMAP

Model-averaged latent semantic indexing

Characterizing the value of personalizing search

Improving retrieval accuracy by weighting document types with clickthrough data

Protecting source privacy in federated search

Applying ranking SVM in query relaxation

Learning to rank collections

VideoReach: an online video recommendation system

Modelling epistemic uncertainty in ir evaluation

On the importance of preserving the part-order in shape retrieval

The relationship between IR effectiveness measures and user satisfaction

A multi-criteria content-based filtering system

Boosting static pruning of inverted files

Resource monitoring in information extraction

The DILIGENT framework for distributed information retrieval

Varying approaches to topical web query classification

A comparison of pooled and sampled relevance judgments

Clustering short texts using wikipedia

Estimating collection size with logistic regression

Selection and ranking of text from highly imperfect transcripts for retrieval of video content

Enhancing patent retrieval by citation analysis

MRF based approach for sentence retrieval

Improving weak ad-hoc queries using wikipedia asexternal corpus

Fine-grained named entity recognition and relation extraction for question answering

World knowledge in broad-coverage information filtering

The influence of basic tokenization on biomedical document retrieval

Using clustering to enhance text classification

A fact/opinion classifier for news articles

Matching resumes and jobs based on relevance models

The utility of linguistic rules in opinion mining

A comparison of sentence retrieval techniques

High-dimensional visual vocabularies for image retrieval

A web page topic segmentation algorithm based on visual criteria and content layout

Document clustering: an optimization problem

Finding similar experts

Active learning for class imbalance problem

Strategies for retrieving plagiarized documents

Generative modeling of persons and documents for expert search

Random walk term weighting for information retrieval

Comparing query logs and pseudo-relevance feedbackfor web-search query refinement

Automatic extension of non-english wordnets

First experiments searching spontaneous Czech speech

Power and bias of subset pooling strategies

Problems with Kendall's tau

Opinion holder extraction from author and authority viewpoints

Incorporating term dependency in the dfr framework

Hits on question answer portals: exploration of link analysis for author ranking

Heads and tails: studies of web search with common and rare queries

Dimensionality reduction for dimension-specific search

An effective method for finding best entry points in semi-structured documents

Query rewriting using active learning for sponsored search

An analysis of peer-to-peer file-sharing system queries

Investigating the relevance of sponsored results for web ecommerce queries

Viewing online searching within a learning paradigm

More efficient parallel computation of pagerank

Using similarity links as shortcuts to relevant web pages

Fast exact maximum likelihood estimation for mixture of language models

TimedTextRank: adding the temporal dimension to multi-document summarization

Winnowing wheat from the chaff: propagating trust to sift spam from the web

Feature engineering for mobile (SMS) spam filtering

Ranking by community relevance

Query suggestion based on user landing pages

Making mind and machine meet: a study of combining cognitive and algorithmic relevance feedback

Using collaborative queries to improve retrieval for difficult topics

Retrieval of discussions from enterprise mailing lists

Effects of highly agreed documents in relevancy prediction

Detecting word substitutions: PMI vs. HMM

Workload sampling for enterprise search evaluation

Document layout and color driven image retrieval

Large-scale cluster-based retrieval experiments on Turkish texts

Improving active learning recall via disjunctive boolean constraints

Creativity support: information discovery and exploratory search

Web text retrieval with a P2P query-driven index

Content Provider	ACM Digital Library
Author	Zarko, Ivana Podnar Skobeltsyn, Gleb Luu, Toan Rajman, Martin Aberer, Karl
Abstract	In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable storage and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size. However, as the number of generated term combinations can still become quite large, we also use term statistics extracted from available query logs to index only such combinations that are frequently present in user queries. Thus, by avoiding the generation of superfluous indexing term combinations, we achieve an additional substantial reduction in bandwidth and storage consumption. As a result, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users. More precisely, our theoretical analysis and experimental results indicate that, at the price of a marginal loss in retrieval quality for rare queries, the generated index size and network traffic remain manageable even for web-size document collections. Furthermore, our experiments show that at the same time the achieved retrieval quality is fully comparable to the one obtained with a state-of-the-art centralized query engine.
Starting Page	679
Ending Page	686
Page Count	8
File Format	PDF
ISBN	9781595935977
DOI	10.1145/1277741.1277857
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2007-07-23
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Dht Precision Ir Query-driven indexing Trec Text retrieval P2p
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in