NDLI: Learning block importance models for web pages

Please wait, while we are loading the content...

What's new on the web?: the evolution of the web from a search engine perspective

Anti-aliasing on the web

Smartback: supporting users in back navigation

Unsupervised learning of soft patterns for generating definitions from online news

Session level techniques for improving web browsing performance on wireless links

XVM: a bridge between xml data and its behavior

Liveclassifier: creating hierarchical text classifiers through web corpora

Staging transformations for multimodal web interaction management

A method for transparent admission control and request scheduling in e-commerce web sites

Ranking the web frontier

Using link analysis to improve layout on mobile devices

Building a companion website in the semantic web

Shilling recommender systems for fun and profit

Managing versions of web documents in a transaction-time web server

Incremental formalization of document annotations through ontology-based paraphrasing

Newsjunkie: providing personalized newsfeeds via analysis of information novelty

Accurate, scalable in-network identification of p2p traffic using application signatures

Augmenting semantic web service descriptions with compositional specification

Mining models of human activities from the web

XQuery at your web service

Index structures and algorithms for querying distributed RDF repositories

A hierarchical monothetic document clustering algorithm for summarization and browsing search results

Practical semantic analysis of web sites and documents

A possible simplification of the semantic web architecture

Understanding user goals in web search

Securing web application code by static analysis and runtime protection

Web accessibility: a broader view

Web-scale information extraction in knowitall: (preliminary results)

Flexible on-device service object replication with replets

Schemapath, a minimal extension to xml schema for conditional constraints

Using urls and table layout for web classification tasks

Enforcing strict model-view separation in template engines

How to make a semantic web browser

A smart hill-climbing algorithm for application server configuration

Link fusion: a unified link analysis framework for multi-type interrelated data objects

An evaluation of binary xml encoding optimizations for fast stream based xml processing

A hybrid approach for searching in the semantic web

Propagation of trust and distrust

Fine-grained, structured configuration management for web projects

Towards the self-annotating web

Information diffusion through blogspace

Characterization of a large web site population with implications for content delivery

Meteor-s web service annotation framework

Texquery: a full-text search extension to xquery

Adapting databases and WebDAV protocol

Remindin': semantic query routing in peer-to-peer networks based on social metaphors

Mining anchor text for query refinement

Web customization using behavior-based remote executing agents

A combined approach to checking web ontologies

Impact of search engines on page popularity

Trust-serv: model-driven lifecycle management of trust negotiation policies for web services

Hearsay: enabling audio browsing on hypertext content

Is question answering an acquired skill?

Improving web browsing performance on wireless pdas using thin-client computing

Composite events for xml

Learning block importance models for web pages

A flexible framework for engineering "my" portals

Parsing owl dl: trees or triples?

Challenges and practices in deploying web acceleration solutions for distributed enterprise systems

Sic transit gloria telae: towards an understanding of the web's decay

Optimization of html automatically generated by wysiwyg programs

CS AKTive space: representing computer science in the semantic web

A community-aware search engine

Automatic detection of fragments in dynamically generated web pages

Web taxonomy integration using support vector machines

Automatic web news extraction using tree edit distance

Analyzing client interactivity in streaming media

Foundations for service ontologies: aligning OWL-S to dolce

The webgraph framework I: compression techniques

Analysis of interacting BPEL web services

RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network

Adaptive web search based on user profile constructed without any effort from users

A proposal for an owl rules language

Learning block importance models for web pages

Content Provider	ACM Digital Library
Author	Wen, Ji-Rong Ma, Wei-Ying Liu, Haifeng Song, Ruihua
Abstract	Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. However, no uniform approach and model has been presented to measure the importance of different segments in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use a vision-based page segmentation algorithm to partition a web page into semantic blocks with a hierarchical structure. Then spatial features (such as position and size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Based on these features, learning algorithms are used to train a model to assign importance to different segments in the web page. In our experiments, the best model can achieve the performance with Micro-F1 79% and Micro-Accuracy 85.9%, which is quite close to a person's view.
Starting Page	203
Ending Page	211
Page Count	9
File Format	PDF
ISBN	158113844X
DOI	10.1145/988672.988700
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2004-05-17
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Classification Web mining Page segmentation Block importance model
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in