Loading...
Please wait, while we are loading the content...
Similar Documents
A Two-Phase Sampling Technique for Information Extraction from Hidden Web Databases (2004)
| Content Provider | CiteSeerX |
|---|---|
| Author | Hedley, Y. L. Younas, M. James, A. |
| Description | In: Proc. of the 6th ACM CIKM Workshop on Web Information and Data Management (WIDM’04 Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users’ queries. However, the documents are generated by Web page templates, which contain information that is irrelevant to queries. This paper presents a Two-Phase Sampling (2PS) technique that detects templates and extracts query-related information from the sampled documents of a database. In the first phase, 2PS queries databases with terms contained in their search interface pages and the subsequently sampled documents. This process retrieves a required number of documents. In the second phase, 2PS detects Web page templates in the sampled documents in order to extract information relevant to queries. We test 2PS on a number of realworld Hidden Web databases. Experimental results demonstrate that 2PS effectively eliminates irrelevant information contained in Web page templates and generates terms and frequencies with improved accuracy. |
| File Format | |
| Language | English |
| Publisher | ACM |
| Publisher Date | 2004-01-01 |
| Access Restriction | Open |
| Subject Keyword | Hidden Web Database Generates Term First Phase Second Phase Information Relevant Hidden Web Irrelevant Information Sampled Document Information Extraction Two-phase Sampling Web Page Template Search Interface Page Required Number Realworld Hidden Web Two-phase Sampling Technique Improved Accuracy User Query Experimental Result Query-related Information Specialised Document Detects Web Page Template |
| Content Type | Text |
| Resource Type | Article |