NDLI: Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Please wait, while we are loading the content...

Position paper: a comparison of two modelling paradigms in the Semantic Web

Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework

Topical TrustRank: using topicality to combat web spam

XML screamer: an integrated approach to high performance XML parsing, validation and deserialization

FeedEx: collaborative exchange of news feeds

Examining the content and privacy of web browsing incidental information

Bringing communities to the semantic web and the semantic web to communities

The web beyond popularity: a really simple system for web scale RSS

Finding advertising keywords on web pages

SecuBat: a web vulnerability scanner

Toward tighter integration of web search with a geographic information system

Addressing the testing challenge with a web-based e-assessment system that tutors as it assesses

Cat and mouse: content delivery tradeoffs in web access

Random sampling from a search engine's index

POLYPHONET: an advanced social network extraction system from the web

Relaxed: on the way towards true validation of compound documents

Retroactive answering of search queries

Position paper: ontology construction from online ontologies

Designing ethical phishing experiments: a study of (ROT13) rOnl query features

A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Towards content trust of web resources

Dynamic placement for clustered web applications

Improved annotation of the blogosphere via autotagging and hierarchical clustering

An e-market framework for informed trading

One document to bind them: combining XML, web services, and the semantic web

Beyond PageRank: machine learning for static ranking

Meaning on the web: evolution vs intelligent design?

Compressing and searching XML data via two zips

The case for multi-user design for computer aided learning in developing regions

Using annotations in enterprise search

Temporal rules for mobile web personalization

The new economy: an engineer's perspective

Web ontology segmentation: analysis, classification and use

Image classification for mobile web browsing

Site level noise removal for search engines

Symmetrically exploiting XML

Off the beaten tracks: exploring three aspects of web navigation

Invisible participants: how cultural capital relates to lurking behavior

Visualizing tags over time

Communities from seed sets

Access control enforcement for conversation-based web services

Geographically focused collaborative crawling

Knowledge modeling and its application in life sciences: a tale of two ontologies

WAP5: black-box performance debugging for wide-area systems

A web-based kernel function for measuring the similarity of short text snippets

Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection

Model-based version and configuration management for a web engineering lifecycle

CWS: a comparative web search system

Position paper: towards the notion of gloss, and the adoption of linguistic resources in formal ontology engineering

Invasive browser sniffing and countermeasures

Time-dependent semantic similarity measure of queries using historical click-through data

Supporting online problem-solving communities with the semantic web

Selective early request termination for busy internet services

Large-scale text categorization by batch mode active learning

The impact of online music services on the demand for stars in the music industry

ASDL: a wide spectrum language for designing web services

Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Identity management on converged networks: a reality check

Analysis of WWW traffic in Cambodia and Ghana

Designing an architecture for delivering mobile information services to the rural developing world

Detecting semantic cloaking on the web

Behavior-based web page evaluation

A case for software assurance

Constructing virtual documents for ontology matching

Fine grained content-based adaptation mechanism for providing high end-user quality of experience with adaptive hypermedia systems

Detecting spam web pages through content analysis

pTHINC: a thin-client architecture for mobile wireless web

Probabilistic models for discovering e-communities

Knowing the user's every move: user activity tracking for website usability evaluation and implicit interaction

What's really new on the web?: identifying new pages from a series of unstable web snapshots

Analysis of communication models in web service compositions

To randomize or not to randomize: space optimal summaries for hyperlink analysis

Reappraising cognitive styles in adaptive web applications

WS-replication: a framework for highly available web services

Generating query substitutions

Exploring social annotations for the semantic web

Model-directed web transactions under constrained modalities

Searching with context

Bootstrapping semantics on the web: meaning elicitation from schemas

Interactive wrapper generation with minimal user effort

Semantic Wikipedia

SCTP: an innovative transport layer protocol for the web

A comparison of implicit and explicit links for web page classification

The web structure of e-government - developing a methodology for quantitative evaluation

Semantic WS-agreement partner selection

Automatic identification of user interest for personalized search

Phoiling phishing

WebKhoj: Indian language IR from multiple character encodings

Detecting online commercial intention (OCI)

Using web browser interactions to predict task

'e-science and cyberinfrastructure: a middleware perspective'

Protecting browser state from web privacy attacks

The next wave of the web

An integrated method for social network extraction

Integrating semantic web and language technologies to improve the online public administrations services

Broken links on the web: local laws and the global free flow of information

DemIL: an online interaction language between citizen and government

Web annotation sharing using P2P

Generating summaries for large collections of geo-referenced photographs

Determining user interests about museum collections

GIO: a semantic web application using the information grid framework

Graphical representation of RDF queries

Question answering on top of the BT digital library

XPath filename expansion in a Unix shell

Microformats: a pragmatic path to the semantic web

SGSDesigner: a graphical interface for annotating and designing semantic grid services

Status of the African Web

Personalization and accessibility: integration of library and web approaches

Testing google interfaces modified for the blind

Verifying genre-based clustering approach to content extraction

A browser for browsing the past web

Live URLs: breathing life into URLs

Structuring namespace descriptions

CiteSeerx: an architecture and web service design for an academic document search engine

Tables and trees don't mix (very well)

Robust web content extraction

Rapid prototyping of web applications combining domain specific languages and model driven design

A pruning-based approach for supporting Top-K join queries

Towards DSL-based web engineering

Capturing the essentials of federated systems

From adaptation engineering to aspect-oriented context-dependency

Living the TV revolution: unite MHP to the web or face IDTV irrelevance!

Using graph matching techniques to wrap data from PDF documents

Requirements for multimedia document enrichment

DiTaBBu: automating the production of time-based hypermedia content

Capturing RIA concepts in a web modeling language

Generation of multimedia TV news contents for WWW

Proposal of integrated search engine of web and TV contents

Using semantic rules to determine access control for web services

Strong authentication in web proxies

Safeguard against unicode attacks: generation and applications of UC-simlist

Efficient edge-services for colorblind users

A user profile-based approach for personal information access: shaping your information portfolio

Finding visual concepts by web image mining

Deriving wishlists from blogs show us your blog, and we'll tell you what books to buy

Relationship between web links and trade

System for spatio-temporal analysis of online news and blogs

Extracting news-related queries from web query log

Visually guided bottom-up table detection and segmentation in web documents

Generating maps of web pages using cellular automata

BuzzRank … and the trend is your friend

Detecting nepotistic links by language model disagreement

The distribution of pageRank follows a power-law only for particular values of the damping factor

Mining related queries from search engine query logs

Discovering event evolution graphs from newswires

Mining clickthrough data for collaborative web search

Background knowledge for ontology construction

Mining RDF metadata for generalized association rules: knowledge discovery in the semantic web era

AutoTag: a collaborative approach to automated tag assignment for weblog posts

Merging trees: file system and content integration

A content and structure website mining model

Online mining of frequent query trees over XML data streams

Using proportional transportation similarity with learned element semantics for XML document clustering

Template guided association rule mining from XML documents

Automatic geotagging of Russian web sites

Using symbolic objects to cluster web documents

Estimating required recall for successful knowledge acquisition from the web

Text-based video blogging

A decentralized CF approach based on cooperative agents

Adaptive web sites: user studies and simulation

On a service-oriented approach for an engineering knowledge desktop

Design and development of learning management system at universiti Putra Malaysia: a case study of e-SPRINT

Providing SCORM with adaptivity

A framework for XML data streams history checking and monitoring

The credibility of the posted information in a recommendation system based on a map

Archiving web site resources: a records management view

Geographic locations of web servers

Why is connectivity in developing regions expensive: policy challenges more than technical limitations?

Bilingual web page and site readability assessment

Mobile web publishing and surfing based on environmental sensing data

DoNet: a semantic domotic framework

Web based device independent mobile map applications.: the m-CHARTIS system

Context-orientated news riltering for web 2.0 and beyond

Efficient search for peer-to-peer information retrieval using semantic small world

Semantic link based top-K join queries in P2P networks

Ontology-based legal information retrieval to improve the information access in e-government

Oyster: sharing and re-using ontologies in a peer-to-peer community

GoGetIt!: a tool for generating structure-driven web crawlers

Towards practical genre classification of web documents

Do not crawl in the DUST: different URLs with similar text

Community discovery and analysis in blogspace

PageSim: a novel link-based measure of web page aimilarity

Finding specification pages according to attributes

Selective hypertext induced topic search

An audio/video analysis mechanism for web indexing

The SOWES approach to P2P web search using semantic overlays

Topic-oriented query expansion for web search

Predictive modeling of first-click behavior in web-search

Proximity within paragraph: a measure to enhance document retrieval performance

Finding experts and their eetails in e-mail corpora

Efficient query subscription processing for prospective search engines

Mining search engine query logs for query recommendation

Effective web-scale crawling through website analysis

Focused crawling: experiences in a real world project

Image annotation using search and mining technologies

Semantic web integration of cultural heritage sources

The ODESeW 2.0 semantic web application framework

Visualizing an historical semantic web with Heml

Beyond XML and RDF: the versatile web query language xcerpt

An ontology for internal and external business processes

Automatic matchmaking of web services

Adding semantics to rosettaNet specifications

HTML2RSS: automatic generation of RSS feed based on structure analysis of HTML document

Logical structure based semantic relationship extraction from semi-structured documents

OWL FA: a metamodeling extension of OWL D

Learning and inferencing in user ontology for personalized semantic web services

Upgrading relational legacy data to the semantic web

How semantics make better wikis

Integrating ecoinformatics resources on the semantic web

HTML2RSS: automatic generation of RSS feed based on structure analysis of HTML document

Path summaries and path partitioning in modern XML databases

Evaluating structural summaries as access methods for XML

FLUX: fuzzy content and structure matching of XML range queries

Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Content Provider	ACM Digital Library
Author	Chakrabarti, Soumen Puniyani, Kriti Das, Sujatha
Abstract	We introduce a new, powerful class of text proximity queries: find an instance of a given "answer type" (person, place, distance) near "selector" tokens matching given literals or satisfying given ground predicates. An example query is type=distance NEAR Hamburg Munich. Nearness is defined as a flexible, trainable parameterized aggregation function of the selectors, their frequency in the corpus, and their distance from the candidate answer. Such queries provide a key data reduction step for information extraction, data integration, question answering, and other text-processing applications. We describe the architecture of a next-generation information retrieval engine for such applications, and investigate two key technical problems faced in building it. First, we propose a new algorithm that estimates a scoring function from past logs of queries and answer spans. Plugging the scoring function into the query processor gives high accuracy: typically, an answer is found at rank 2-4. Second, we exploit the skew in the distribution over types seen in query logs to optimize the space required by the new index structures required by our system. Extensive performance studies with a 10GB, 2-million document TREC corpus and several hundred TREC queries show both the accuracy and the efficiency of our system. From an initial 4.3GB index using 18,000 types from WordNet, we can discard 88% of the space, while inflating query times by a factor of only 1.9. Our final index overhead is only 20% of the total index space needed.
Starting Page	717
Ending Page	726
Page Count	10
File Format	PDF
ISBN	1595933239
DOI	10.1145/1135777.1135882
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2006-05-23
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Indexing annotated text
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in