NDLI: Semantics-Assisted Deep Web Query Interface Classification

Please wait, while we are loading the content...

Analysis of a Relationship Based Access Control Model

Clustering-based Approach for Categorizing Pregnant Women in Obstetrics and Maternity Care

WISE Blogs: A Special Blog Search Engine

Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

REST API Management and Evolution Using MDA

Generating Test Cases for Android Applications through GUI Modeling, Usage Modeling, and Change Analysis

Frequent Subgraph Mining from Streams of Uncertain Data

Detecting Representative Tweets of Microblogging Users

External Event-Based Test Cases for Mobile Application

Big Social Network Mining for "Following" Patterns

Pragamana: Performance Comparison and Programming Alpha-miner Algorithm in Relational Database Query Language and NoSQL Column-Oriented Using Apache Phoenix

User Agent and Privacy Compromise

Efficient Skyline Itemsets Mining

Context-Aware Cloud Service Brokerage: A Solution to the Problem of Data Integration Among SaaS Providers

A Framework for Big Data Analytics

Performance Evaluation MySQL InnoDB and Microsoft SQL Server 2012 for Decision Support Environments

Reliable Virtual Channels over VPN for Cloud

Personality traits, Learning Preferences and Emotions

A New Approach for Generating LOTOS Specifications from UML Dynamic Models

Semantics-Assisted Deep Web Query Interface Classification

Premises of an algebra of Japanese characters

A Graph Transformation Approach for Automatic Test Cases Generation from UML Activity Diagrams

Semantics-Assisted Deep Web Query Interface Classification

Content Provider	ACM Digital Library
Author	Jou, Chichang
Abstract	Huge amounts of structured data sources are hidden in the databases behind web forms. Volumes of deep web contents were estimated to be around 500 times those of surface web. However, many web forms are not deep web query interfaces. To retrieve contents in the web databases, an important task is to identify those web forms that are deep web query interfaces. Deep web contents normally are associated with a specific domain, and many domain semantics are embedded in the web forms. Additionally, returned HTML pages of deep web queries contain particular patterns, which could assist identifying query interfaces. Thus, we collect the following semantics to assist the classification: (1) feature words: for non-query forms and for keyword fields in deep web query interfaces; (2) common fields in a particular domain: their valid values and relationships, and their synonyms. We design and implement a Semantics-Assisted deep Web Query Interface Classifier (SAWQIC) system based on heuristics. In the pre-query analysis of SAWQIC, feature words of non-query form attributes are combined with heuristics to filter out non-query forms. For web forms passing the filtering, we utilize semantics in filling in valid input data for their components to submit the form. In the post-query analysis of SAWQIC, we then use heuristics in analyzing the returned HTML pages to identify the deep web query interfaces. The SAWQIC system is evaluated against web forms for the "Book" and "Job" domains. The experimental results illustrate that SAWQIC could generate highly effective classification measures.
Starting Page	70
Ending Page	78
Page Count	9
File Format	PDF
ISBN	9781450334198
DOI	10.1145/2790798.2790810
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2015-07-13
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Semantics Heuristics Web mining Query interface classification Deep web Web database
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in