NDLI: Towards a unified solution: data record region detection and segmentation

Please wait, while we are loading the content...

Creating user interfaces that entice people to manage better information

Data, health, and algorithmics: computational challenges for biomedicine

Ontology-based data management

Lower-bounding term frequency normalization

Unsupervised transactional query classification based on webpage form understanding

Suggestion set utility maximization using session logs

Discovering missing click-through query language information for web search

Learning to aggregate vertical results into web search results

A probabilistic method for inferring preferences from clicks

Efficiency optimizations for interpolating subqueries

Statistical source expansion for question answering

What and how children search on the web

One is enough: distributed filtering for duplicate elimination

This image smells good: effects of image information scent in search engine results pages

Towards a framework for attribute retrieval

Context-aware search personalization with concept preference

Simulating simple user behavior for system effectiveness evaluation

Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset

Bayesian latent variable models for collaborative item rating prediction

Emerging topic detection using dictionary learning

Coupling or decoupling for KNN search on road networks?: a hybrid framework on user query patterns

Harvesting facts from textual web sources by constrained label propagation

Cloning for privacy protection in multiple independent data publications

Can irrelevant data help semi-supervised learning, why and how?

Discovering top-k teams of experts with/without a leader in social networks

Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach

Scalable density-based subspace clustering

Collective prediction with latent graphs

Towards feature selection in network

Plagiarism detection based on structural information

Towards a unified solution: data record region detection and segmentation

Finding dimensions for queries

Simultaneous joint and conditional modeling of documents tagged from two perspectives

Automated feature generation from structured knowledge

Estimating selectivity for joined RDF triple patterns

Semi-indexing semi-structured data in tiny space

Learning-based relevance feedback for web-based relation completion

Multiple keyword-based queries over XML streams

I/O-efficient algorithms for answering pattern-based aggregate queries in a sequence OLAP system

Ranking support for keyword search on structured data using relevance models

The quality of the XML web

High efficiency and quality: large graphs matching

Learning to target: what works for behavioral targeting

Enriching textbooks with images

Search result diversification for enterprise data

Spectral analysis of a blogosphere

Scalable entity matching computation with materialization

Exploratory search over social-medical data

Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs

MEMSCALE: in-cluster-memory databases

Computational geography

DTMBIO 2011: international workshop on data and textmining in biomedical informatics

Social and collaborative information seeking: panel

A quasi-synchronous dependence model for information retrieval

Assigning documents to master sites in distributed search

Improving context-aware query classification via adaptive self-training

Interactive sense feedback for difficult queries

Coreference aware web object retrieval

Intent-aware query similarity

Efficiently encoding term co-occurrences in inverted indexes

Passage retrieval for incorporating global evidence in sequence labeling

Personalizing web search results by reading level

Duplicate detection through structure optimization

Retrieving and ranking unannotated images through collaboratively mining online search results

Building directories for social tagging systems

A framework for personalized and collaborative clustering of search results

Click the search button and be happy: evaluating direct and immediate information access

A pairwise ranking based approach to learning with positive and unlabeled examples

Timing when to buy

Focusing on novelty: a crawling strategy to build diverse language models

Toward traffic-driven location-based web search

Towards a top-down and bottom-up bidirectional approach to joint information extraction

Privacy-aware querying over sensitive trajectory data

Toward interactive training and evaluation

Content based social behavior prediction: a multi-task learning approach

Language-independent sentiment classification using three common words

Correlated multi-label feature selection

Who will follow you back?: reciprocal relationship prediction

Practical representations for web and social graphs

Studying how the past is remembered: towards computational history through large scale text mining

Fast metadata-driven multiresolution tensor decomposition

Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge

External evaluation measures for subspace clustering

Filtering and clustering relations for unsupervised information extraction in open domain

Efficient resource attribute retrieval in RDF triple stores

Evaluation of set-based queries with aggregation constraints

Categorising logical differences between OWL ontologies

Authentication of location-based skyline queries

Tractable XML data exchange via relations

Efficient similarity search: arbitrary similarity measures, arbitrary composition

Context-based entity description rule for entity resolution

DELTA: indexing and querying multi-labeled graphs

Large-scale behavioral targeting with a social twist

Exploring the corporate ecosystem with a semi-supervised entity graph

Diversification for multi-domain result sets

Citation chain aggregation: an interaction model to support citation cycling

Predicting the optimal ad-hoc index for reachability queries on graph databases

Black swan: augmenting statistics with event data

Scalable similarity search of timeseries with variable dimensionality

H-DB: a hybrid quantitative-structural sql optimizer

Large-scale array analytics: taming the data tsunami

BooksOnline'11: 4th workshop on online books, complementary social media, and crowdsourcing

Improving retrieval accuracy of difficult queries through generalizing negative document language models

Discovering URLs through user feedback

A task level metric for measuring web search satisfaction and its application on improving relevance estimation

Reranking search results for sparse queries

Tag clouds revisited

Semi-supervised learning to rank with preference regularization

SIMD-based decoding of posting lists

Effective and efficient polarity estimation in blogs based on sentence-level evidence

Location-aware click prediction in mobile local search

SISP: a new framework for searching the informative subgraph based on PSO

Adaptive parallel approximate similarity search for responsive multimedia retrieval

Workload-aware indexing for keyword search in social networks

Using query log and social tagging to refine queries based on latent topics

Local computation of PageRank: the ranking side

Robust nonnegative matrix factorization using L21-norm

Assisting web search users by destination reachability

Natural event summarization

CLUES: a unified framework supporting interactive exploration of density-based clusters in streams

From names to entities using thematic context distance

Privacy preserving indexing for eHealth information networks

Semi-supervised multi-task learning of structured prediction models for web information extraction

Improving user interest inference from social neighbors

A cross-domain adaptation method for sentiment classification using probabilistic latent analysis

Pattern change discovery between high dimensional data sets

Link prediction: the power of maximal entropy random walk

Determining the diameter of small world networks

Combining machine learning and human judgment in author disambiguation

Enabling information extraction by inference of regular expressions from sample entities

Hierarchical tag visualization and application for tag recommendations

Behavior-driven clustering of queries into topics

Facilitating pattern discovery for relation extraction with semantic-signature-based clustering

Effective stratification for low selectivity queries on deep web data sources

Index structures and top-k join algorithms for native keyword search databases

ReDRIVE: result-driven database exploration through recommendations

Matching query processing in high-dimensional space

A parallel algorithm for computing borders

Learning to rank results in relational keyword search

Cost-efficient repair in inconsistent probabilistic databases

Skynets: searching for minimum trees in graphs with incomparable edge weights

Evolving social search based on bookmarks and status messages from social networks

Generating links to background knowledge: a case study using narrative radiology reports

A peer's-eye view: network term clouds in a peer-to-peer system

Collaborative blacklist generation via searches-and-clicks

Collection-based compression using discovered long matching strings

A data mining system based on SQL queries and UDFs for relational databases

RoSeS: a continuous query processor for large-scale RSS filtering and aggregation

Health conversational system based on contextual matching of community-driven question-answer pairs

Large-scale information retrieval experimentation with terrier

Detect'11: international workshop on DETecting and Exploiting Cultural diversiTy on the social web

S3K: seeking statement-supporting top-K witnesses

User browsing behavior-driven web crawling

Multi-view random walk framework for search task discovery from click-through log

Searching microblogs: coping with sparsity and document quality

Ranking-based processing of SQL queries

Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization

Factorization-based lossless compression of inverted indices

Sentiment classification based on supervised latent n-gram analysis

Text vs. space: efficient geo-search query processing

Indexes for highly repetitive document collections

A linear-time approximation of the earth mover's distance

Effective retrieval of resources in folksonomies using a new tag similarity measure

Retrieval models for audience selection in display advertising

Prioritizing relevance judgments to improve the construction of IR test collections

TAKES: a fast method to select features in the kernel space

Modeling personalized email prioritization: classification-based and regression-based approaches

Transferring topical knowledge from auxiliary long texts for short text clustering

e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning

Learning conditional random fields with latent sparse features for acronym expansion finding

Recommendation in the end-to-end encrypted domain

Memory-less unsupervised clustering for data streaming by versatile ellipsoidal function

CASINO: towards conformity-aware social influence analysis in online social networks

Using games with a purpose and bootstrapping to create domain-specific sentiment lexicons

MTopS: scalable processing of continuous top-k multi-query workloads

Exploiting longer cycles for link prediction in signed networks

Detecting anomalies in graphs with numeric labels

Citation count prediction: learning to estimate future citations for literature

Mining entity translations from comparable corpora: a holistic graph mapping approach

Perspective hierarchical dirichlet process for user-tagged image modeling

Discovering customer intent in real-time for streamlining service desk conversations

Finding all justifications of OWL entailments using TMS and MapReduce

Finding information nebula over large networks

Optimized processing of multiple aggregate continuous queries

Information re-finding by context: a brain memory inspired approach

Answering label-constraint reachability in large graphs

Supporting queries spanning across phases of evolving artifacts using Steiner forests

Adding structure to top-k: from items to expansions

Approximate tensor decomposition within a tensor-relational algebraic framework

Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs

Social ranking for spoken web search

Information extraction from pathology reports in a hospital setting

RerankEverything: a reranking interface for exploring search results

Attention prediction on social media brand pages

A robust index for regular expression queries

Data-thirsty business analysts need SODA: search over data warehouse

Conkar: constraint keyword-based association discovery

Annotating knowledge work lifelog: term extraction from sensor and operation history

Statistical information retrieval modelling: from the probability ranking principle to recent advances in diversity, portfolio theory, and beyond

4th international workshop on patent information retrieval (PaIR'11)

Finding relevant information of certain types from enterprise data

Diversifying search results of controversial queries

Query sampling for learning data fusion

Finding images of difficult entities in the long tail

Keyword search over RDF graphs

Collaborative online learning of user generated content

TOPSIG: topology preserving document signatures

Legal document clustering with built-in topic segmentation

Partial duplicate detection for large book collections

Content-driven detection of campaigns in social media

A language model approach to capture commercial intent and information relevance for sponsored search

Evaluating an associative browsing model for personal information

Designing an ensemble classifier over subspace classifiers using iterative convergence routine

Diversification and refinement in collaborative filtering recommender

LogSig: generating system events from raw textual logs

Optimising ontology stream reasoning with truth maintenance system

Accounting for data dependencies within a hierarchical dirichlet process mixture model

Privacy preservation by independent component analysis and variance control

Coupled nominal similarity in unsupervised learning

Mining direct antagonistic communities in explicit trust networks

Polarity analysis of texts using discourse structure

Probabilistic near-duplicate detection using simhash

Structural link analysis and prediction in microblogs

Extracting multi-dimensional relations: a generative model of groups of entities in a corpus

Extracting cross references from life science databases for search result ranking

Max margin learning on domain-independent web information extraction

Asking what no one has asked before: using phrase similarities to generate synthetic web search queries

Sparse structured probabilistic projections for factorized latent spaces

Efficient methods for finding influential locations with adaptive grids

XQuery optimization based on program slicing

Semantic data markets: a flexible environment for knowledge management

The list Viterbi training algorithm and its application to keyword search over databases

Provenance-based refresh in data-oriented workflows

TEXplorer: keyword-based object search and exploration in multidimensional text databases

RFID data analysis using tensor calculus for supply chain management

CP-index: on the efficient indexing of large graphs

Effects of search success on search engine re-use

Extract knowledge from semi-structured websites for search task simplification

HealthTrust: trust-based retrieval of you tube's diabetes channels

Do they belong to the same class: active learning by querying pairwise label homogeneity

Integrating and querying web databases and documents

An integrated environment for semantic knowledge work

Interactive reasoning in uncertain RDF knowledge bases

Entity timelines: visual analytics and named entity evolution

Web-based open-domain information extraction

Overview of the third international workshop on search and mining user-generated contents

Relevance weighting using within-document term statistics

Query session detection as a cascade

Learning to rank user intent

Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate

Structured learning of two-level dynamic rankings

Implementation techniques for large-scale latent semantic indexing applications

Exploring categorization property of social annotations for information retrieval

Learning to rank audience for behavioral targeting in display ads

Summarizing web forum threads based on a latent topic propagation process

Feature selection using hierarchical feature clustering

Connecting users with similar interests via tag network inference

A query-based multi-document sentiment summarizer

Temporal link prediction by integrating content and structure information

Distributed social graph embedding

Extracting collective expectations about the future from large text collections

Advancing the discovery of unique column combinations

Context-based people search in labeled social networks

Spreadsheet-based complex data transformation

Privacy protected knowledge management in services with emphasis on quality data

Item categorization in the e-commerce domain

Structured data classification by means of matrix factorization

Processing the signature quadratic form distance on many-core GPU architectures

Editing knowledge resources: the wiki way

Fu-Finder: a game for studying querying behaviours

PICASSO: automated soundtrack suggestion for multi-modal data

Advances in data stream mining for mobile and ubiquitous environments

Web science and information exchange in the medical web

Do all birds tweet the same?: characterizing twitter around the world

Classification and annotation in social corpora using multiple relations

Continuously monitoring the correlations of massive discrete streams

On benchmarking data translation systems for semantic-web ontologies

An efficient method for using machine translation technologies in cross-language patent search

Transfer active learning

Top-k most influential locations selection

Marco Polo: a system for brand-based shopping and exploration

PDFMeat: managing publications on the semantic desktop

P2Prec: a social-based P2P recommendation system

Information diffusion in social networks: observing and affecting what society cares about

3rd international workshop on collaborative information retrieval (CIR2011)

Understanding the types of information humans associate with geographic objects

A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN

Defining isochrones in multimodal spatial networks

Information retrieval challenges in computational advertising

DESIRE 2011: first international workshop on data infrastructures for supporting information retrieval evaluation

Google, bing and a new perspective on ranking similarity

Representing document as dependency graph for document clustering

On the elasticity of NoSQL databases over cloud management platforms

PIKM 2011: the 4th ACM workshop for Ph.D. students in information and knowledge management

Effectiveness beyond the first crawl tier

Finding redundant and complementary communities in multidimensional networks

Continuous data stream query in the cloud

Uncertain schema matching: the power of not knowing

Managing interoperability and complexity inhealth systems: MIXHS'11 workshop summary

Worker types and personality traits in crowdsourcing relevance labels

Promotional subspace mining with EProbe framework

A cluster based mobile peer to peer architecture in wireless ad hoc networks

Report on the third international workshop on cloud datamanagement (CloudDB 2011)

A nugget-based test collection construction paradigm

A partitioning method for symbolic interval data based on kernelized metric

Block-based load balancing for entity resolution with MapReduce

Search and mining entity-relationship data

Recency ranking by diversification of result set

Hierarchy evolution for improved classification

PCMLogging: reducing transaction logging overhead with PCM

Fourth workshop on exploiting semantic annotations in information retrieval (ESAIR)

Patent query reduction using pseudo relevance feedback

Using random walks for multi-label classification

A continuous query evaluation scheme for a detection-only query over data streams

LSDS-IR'11: the 9th workshop on large-scale and distributed systems for information retrieval

Relevance feedback exploiting query-specific document manifolds

Latent feature encoding using dyadic and relational data

Subject-oriented top-k hot region queries in spatial dataset

DOLAP 2011: overview of the 14th international workshop on data warehousing and olap

Insights into explicit semantic analysis

Learning kernels with upper bounds of leave-one-out error

k-Nearest neighbor query processing method based on distance relation pattern

On bias problem in relevance feedback

KLEAP: an efficient cleaning method to remove cross-reads in RFID streams

Efficient query rewrite for structured web queries

Selecting related terms in query-logs using two-stage SimRank

A diversity measure leveraging domain specific auxiliary information

Rule-based construction of matching processes

On relevance, time and query expansion

Mining query structure from click data: a case study of product queries

A taxonomy of local search: semi-supervised query classification driven by information needs

Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model

Towards expert finding by leveraging relevant categories in authority ranking

ONTOCUBE: efficient ontology extraction using OLAP cubes

Hybrid models for future event prediction

Joint inference for cross-document information extraction

An algorithm for axiom pinpointing in EL+ and its incremental variant

Adaptive term frequency normalization for BM25

Building a generic debugger for information extraction pipelines

Folksonomy-based term extraction for word cloud generation

An unsupervised ranking method based on a technical difficulty terrain

Fast supervised feature extraction by term discrimination information pooling

Efficient association discovery with keyword-based constraints on large graph data

When close enough is good enough: approximate positional indexes for efficient ranked retrieval

Constructing efficient information extraction pipelines

AWETO: efficient incremental update and querying in rdf storage system

Index tuning for query-log based on-line index maintenance

CoRankBayes: bayesian learning to rank under the co-training framework and its application in keyphrase extraction

Insert-friendly XML containment labeling scheme

Efficient phrase querying with flat position index

Discovering trending phrases on information streams

A pretopological framework for the automatic construction of lexical-semantic structures from texts

Trained trigger language model for sentence retrieval in QA: bridging the vocabulary gap

Review recommendation: personalized prediction of the quality of online reviews

Leveraging web 2.0 data for scalable semi-supervised learning of domain-specific sentiment lexicons

Topic modeling for named entity queries

Improving k-nearest neighbors algorithms: practical application of dataset analysis

Classifying trending topics: a typology of conversation triggers on Twitter

Semantic convolution kernels over dependency trees: smoothed partial tree kernel

Structured collaborative filtering

Enhancing accessibility of microblogging messages using semantic knowledge

Recommending citations with translation model

User oriented tweet ranking: a filtering approach to microblogs

Imbalanced sentiment classification

Extracting adjective facets from community Q&A corpus

A semi-supervised hybrid system to enhance the recommendation of channels in terms of campaign roi

The where in the tweet

A novel framework of training hidden markov support vector machines from lightly-annotated data

YANA: an efficient privacy-preserving recommender system for online social communities

Question identification on twitter

Learning to recommend questions based on public interest

More influence means less work: fast latent dirichlet allocation by influence scheduling

OpinioNetIt: understanding the opinions-people network for politically controversial topics

CQC: classifying questions in CQA websites

Utility-driven anonymization in data publishing

Predicting the uncertainty of sentiment adjectives in indirect answers

Automatic query reformulation with syntactic operators to alleviate search difficulty

Privacy preserving feature selection for distributed data using virtual dimension

Sentiment classification via l2-norm deep belief network

Question routing in community question answering: putting category in its place

Switch detector: an activity spotting system for desktop

Domain customization for aspect-oriented opinion analysis with multi-level latent sentiment clues

Fact-based question decomposition for candidate answer re-ranking

LSH based outlier detection and its application in distributed setting

Accurate information extraction for quantitative financial events

CoDet: sentence-based containment detection in news corpora

Authormagic: an approach to author disambiguation in large-scale digital libraries

A machine-learned proactive moderation system for auction fraud detection

Smoothing NDCG metrics using tied scores

DIGRank: using global degree to facilitate ranking in an incomplete graph

Simultaneously improving CSAT and profit in a retail banking organization

Learning to rank with cross entropy

On selection of objective functions in multi-objective community detection

Coarse-to-fine classification via parametric and nonparametric models for computer-aided diagnosis

Predicting document effectiveness in pseudo relevance feedback

Suggesting ghost edges for a smaller world

Learning to rank categories for web queries

Examining the "leftness" property of Wikipedia categories

Supervised language modeling for temporal resolution of texts

Detection of text quality flaws as a one-class classification problem

Context-aware query recommendation by learning high-order relation in query logs

Two birds with one stone: learning semantic models for text categorization and word sense disambiguation

Efficient $l_{p}-norm$ multiple feature metric learning for image categorization

More or better: on trade-offs in compacting textual problem solution repositories

Re-ranking by local re-scoring for video indexing and retrieval

Mining frequent patterns across multiple data streams

Tightly coupling visual and linguistic features for enriching audio-based web browsing experience

SILA: a spatial instance learning approach for deep webpages

Robust video fingerprinting based on hierarchical symmetric difference feature

A geographic study of tie strength in social media

Image clustering fusion technique based on BFS

Named entity recognition using a modified Pegasos algorithm

Efficient retrieval of 3D building models using embeddings of attributed subgraphs

WikiLabel: an encyclopedic approach to labeling documents en masse

Constructing seminal paper genealogy

Towards noise-resilient document modeling

Leveraging Wikipedia concept and category information to enhance contextual advertising

Probabilistic model for discovering topic based communities in social networks

Beyond relevance in marketplace search

Relative effect of spam and irrelevant documents on user interaction with search engines

Inferring query aspects from reformulations using clustering

Advertiser-centric approach to understand user click behavior in sponsored search

Supervised matching of comments with news article segments

User action interpretation for personalized content optimization in recommender systems

A personalized recommendation system on scholarly publications

Collaborative exploratory search in real-world context

Beyond precision@10: clustering the long tail of web search results

Towards a unified solution: data record region detection and segmentation

Content Provider	ACM Digital Library
Author	Gu, Yuan Bing, Lidong Lam, Wai
Abstract	Although the task of data record extraction from Web pages has been studied extensively, yet it fails to handle many pages due to their complexity in format or layout. In this paper, we propose a unified method to tackle this task by addressing several key issues in a uniform manner. A new search structure, named as Record Segmentation Tree (RST), is designed, and several efficient search pruning strategies on the RST structure are proposed to identify the records in a given Web page. Another characteristic of our method which is significantly different from previous works is that it can effectively handle complicated and challenging data record regions. It is achieved by generating subtree groups dynamically from the RST structure during the search process. Furthermore, instead of using string edit distance or tree edit distance, we propose a token-based edit distance which takes each DOM node as a basic unit in the cost calculation. Extensive experiments are conducted on four data sets, including flat, nested, and intertwine records. The experimental results demonstrate that our method achieves higher accuracy compared with three state-of-the-art methods.
Starting Page	1265
Ending Page	1274
Page Count	10
File Format	PDF
ISBN	9781450307178
DOI	10.1145/2063576.2063761
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2011-10-24
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Web data record extraction Rst structure Web information integration
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in