NDLI: An unsupervised framework for extracting and normalizing product attributes from multiple web sites

Please wait, while we are loading the content...

On iterative intelligent medical search

Towards breaking the quality curse.: a web-querying approach to web people search.

Score standardization for inter-collection comparison of retrieval systems

Attack resistant collaborative filtering

A boosting algorithm for learning bipartite ranking functions with partially labeled data

Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces

User adaptation: good results from poor systems

The opposite of smoothing: a language model approach to ranking query-specific document clusters

A study of learning a merge model for multilingual information retrieval

A study of methods for negative relevance feedback

Learning to rank with partially-labeled data

Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization

Algorithmic mediation for collaborative exploratory search

A user browsing model to predict search engine click data from past observations.

Learning to reduce the semantic gap in web image retrieval and annotation

A few examples go a long way: constructing query models from elaborate query formulations

Affective feedback: an investigation into the role of emotions in the information seeking process

Discriminative probabilistic models for passage based retrieval

Separate and inequal: preserving heterogeneity in topical authority flows

Finding question-answer pairs from online forums

Discovering key concepts in verbose queries

Real-time automatic tag recommendation

Spectral geometry for simultaneously clustering and ranking query search results

SpotSigs: robust and efficient near duplicate detection in large web collections

A new rank correlation coefficient for information retrieval

Deep classification in large-scale text hierarchies

Evaluation over thousands of queries

Relevance judgments between TREC and Non-TREC assessors

Personal vs non-personal blogs: initial classification experiments

AdImage: video advertising by image matching and ad scheduling optimization

A method for transferring retrieval scores between collections with non-overlapping vocabularies

XML-aided phrase indexing for hypertext documents

Minexml: bridging unstructured query with structured resources via mediated query

Exploiting XML structure to improve information retrieval in peer-to-peer systems

Delighting Chinese users: the Google China experience

Effective and efficient user interaction for long queries

An unsupervised framework for extracting and normalizing product attributes from multiple web sites

The good and the bad system: does the test collection predict users' effectiveness?

EigenRank: a ranking-oriented approach to collaborative filtering

Directly optimizing evaluation measures in learning to rank

ResIn: a combination of results caching and index pruning for high-performance web search engines

Exploring folksonomy for personalized search

Enhancing text clustering by leveraging Wikipedia semantics

Bilingual topic aspect classification with a few training examples

A bayesian logistic regression model for active relevance feedback

Learning to rank with SoftRank and Gaussian processes

Comments-oriented document summarization: understanding documents with readers' feedback

Exploiting correlated keywords to improve approximate information filtering

Learning query intent from regularized click graphs

A lattice-based approach to query-by-example spoken document retrieval

A unified and discriminative model for query refinement

Optimizing relevance and revenue in ad search: a query substitution approach

A new probabilistic retrieval model based on the dirichlet compound multinomial distribution

BrowseRank: letting web users vote for page importance

Retrieval models for question and answer archives

Ambiguous queries: test collections need more sense

Efficient top-k querying over social-tagging networks

A rank-aggregation approach to searching for optimal query-specific clusters

Local text reuse detection

Learning from labeled features using generalized expectation criteria

Topic-bridged PLSA for cross-domain text classification

Novelty and diversity in information retrieval evaluation

Evaluation measures for preference judgments

Exploiting subjectivity analysis in blogs to improve political leaning categorization

Bag-of-visual-words expansion using visual relatedness for video indexing

Improving relevance feedback in language modeling with score regularization

Proximity-aware scoring for XML retrieval

Clustering search results for mobile terminals

Affective feedback: an investigation into the role of emotions in the information seeking process

Guilt by association as a search principle

How do users find things with PubMed?: towards automatic utility evaluation with user simulations

Enhancing web search by promoting multiple search engine use

Retrieval sensitivity under training using different measures

Personalized active learning for collaborative filtering

Query dependent ranking using K-nearest neighbor

Reorganizing compressed text

To personalize or not to personalize: modeling queries with variation in user intent

Knowledge transformation from word space to document space

Crosslingual location search

A cluster-based resampling method for pseudo-relevance feedback

Learning to rank at query-time using association rules

Multi-document summarization using cluster-based link analysis

Retrieval and feedback models for blog feed search

Query expansion using gaze-based feedback on the subdocument level

A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval

TF-IDF uncovered: a study of theories and probabilities

Exploring traversal strategy for web forum crawling

Predicting information seeker satisfaction in community question answering

Automatically identifying localizable queries

Social tag prediction

A comparative evaluation of different link types on enhancing document clustering

TSCAN: a novel method for topic summarization and content anatomy

A simple and efficient sampling method for estimating AP and NDCG

trNon-greedy active learning for text categorization using convex ansductive experimental design

Relevance assessment: are judges exchangeable and does it matter

Exploring evaluation metrics: GMAP versus MAP

Ranking opinionated blog posts using OpinionFinder

A word shape coding method for camera-based document images

Theoretical bounds on and empirical robustness of score regularization to different similarity measures

Locating relevant text within XML documents

Refining search results with facet landscapes

Exploring and measuring dependency trees for informationretrieval

Selecting good expansion terms for pseudo-relevance feedback

Learning to rank with ties

Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

A general optimization framework for smoothing language models on graph structures

Classifiers without borders: incorporating fielded text from neighboring web pages

Intuition-supporting visualization of user's performance based on explicit negative higher-order relevance

A new interpretation of average precision

Searching blogs and news: a study on popular queries

Term clouds as surrogates for user generated speech

A study of query length

A flexible extension of XPath to improve XML querying

Ice-tea: an interactive cross-language search engine with translation enhancement

The search for expertise: to the documents and beyond

Comparing metrics across TREC and NTCIR:: the robustness to pool depth bias

Aggregated click-through data in a homogeneous user community

A faceted interface for multimedia search

Don't have a stemmer?: be un+concern+ed

Combining document- and paragraph-based entity ranking

Cross-lingual search over 22 european languages

Task detection for activity-based desktop search

Relevance thresholds in system evaluations

To tag or not to tag -: harvesting adjacent metadata in large-scale tagging systems

WISA: a novel web image semantic analysis system

Parsimonious concept modeling

Re-ranking search results using document-passage graphs

Social recommendations at work

Using a mediated query approach for matching unstructured query with structured resources

Precision-at-ten considered redundant

Exploring question subjectivity prediction in community QA

One-button search extracts wider interests: an empirical study with video bookmarking search

Parsimonious relevance models

Utilizing phrase based semantic information for term dependency

Bilkent news portal: a personalizable system with new event detection and tracking capabilities

Understanding system implementation and user behavior in a collaborative information seeking environment

Structuring collections with Scatter/Gather extensions

On the evolution of the yahoo! answers QA community

Product retrieval for grocery stores

Author-topic evolution analysis using three-way non-negative Paratucker

Inferring the most important types of a query: a semantic approach

Geographic IR and visualization in time and space

Biomedical cross-language information retrieval

Text collections for FIRE

Detecting synonyms in social tagging systems to improve content retrieval

A reranking model for genomics aspect search

Exploiting proximity feature in bigram language model for information retrieval

On multiword entity ranking in peer-to-peer search

Fine-grained relevance feedback for XML retrieval

Towards a combined model for search and navigation of annotated documents

A longitudinal study of real-time search assistance adoption

SOPING: a Chinese customer review mining system

Improving biomedical document retrieval using domain knowledge

Measuring concept relatedness using language models

Site-based dynamic pruning for query processing in search engines

Dynamic visualization of music classification systems

Context and linking in retrieval from personal digital archives

TopicRank: bringing insight to users

Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples

Kleio: a knowledge-enriched information retrieval system for biology

Query-drift prevention for robust query expansion

Exploiting MDS Projections for Cross-language IR

From concepts to implementation and visualization: tools from a team-based approach to ir

Extending language modeling techniques to models of search and browsing activity in a digital library

Talking the talk vs. walking the walk: salience of information needs in querying vs. browsing

Semi-supervised spam filtering: does it work?

Enhancing keyword-based botanical information retrieval with information extraction

Adaptive label-driven scaling for latent semantic indexing

Local approximation of PageRank and reverse PageRank

Exploring mouse movements for inferring query intent

Limits of opinion-finding baseline systems

How medical expertise influences web search interaction

Fixed-threshold SMO for Joint Constraint Learning Algorithm of Structural SVM

Improving text classification accuracy using topic modeling over an additional corpus

Emulating query-biased summaries using document titles

Web query translation via web log mining

Generating diverse katakana variants based on phonemic mapping

Posterior probabilistic clustering using NMF

An algorithm for text categorization

Hierarchical naive bayes models for representing user profiles

Analyzing web text association to disambiguate abbreviation in queries

Exploiting sequential dependencies for expert finding

On document splitting in passage detection

Hypergraph partitioning for document clustering: a unified clique perspective

A topical PageRank based algorithm for recommender systems

Bloggers as experts: feed distillation using expert retrieval models

Modeling expert finding as an absorbing random walk

Learning with support vector machines for query-by-multiple-examples

Pagerank based clustering of hypertext document collections

The impact of history length on personalized search

Search effectiveness with a breadth-first crawl

A scalable assistant librarian: hierarchical subject classification of books

Question classification with semantic tree kernel

An alignment-based pattern representation model for information extraction

User preference choices for complex question answering

Guide focused crawler efficiently and effectively using on-line topical importance estimation

Information retrieval on bug locations by learning co-located bug report clusters

Generalising multiple capture-recapture to non-uniform sample sizes

Relational distance-based collaborative filtering

Towards personalized distributed information retrieval

Web page retrieval in ubiquitous sensor environments

Summarization of compressed text images: an experience on Indic script documents

Predicting when browsing context is relevant to search

Task-aware search personalization

Automatic document prior feature selection for web retrieval

Using parsimonious language models on web data

Query preprocessing: improving web search through a Vietnamese word tokenization approach

An unsupervised framework for extracting and normalizing product attributes from multiple web sites

Content Provider	ACM Digital Library
Author	Wong, Tik-Shun Wong, Tak-Lam Lam, Wai
Abstract	We have developed an unsupervised framework for simultaneously extracting and normalizing attributes of products from multiple Web pages originated from different sites. Our framework is designed based on a probabilistic graphical model that can model the page-independent content information and the page-dependent layout information of the text fragments in Web pages. One characteristic of our framework is that previously unseen attributes can be discovered from the clue contained in the layout format of the text fragments. Our framework tackles both extraction and normalization tasks by jointly considering the relationship between the content and layout information. Dirichlet process prior is employed leading to another advantage that the number of discovered product attributes is unlimited. An unsupervised inference algorithm based on variational method is presented. The semantics of the normalized attributes can be visualized by examining the term weights in the model. Our framework can be applied to a wide range of Web mining applications such as product matching and retrieval. We have conducted extensive experiments from four different domains consisting of over 300 Web pages from over 150 different Web sites, demonstrating the robustness and effectiveness of our framework.
Starting Page	35
Ending Page	42
Page Count	8
File Format	PDF
ISBN	9781605581644
DOI	10.1145/1390334.1390343
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2008-07-20
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Attribute normalization Web mining Attribute extraction
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in