NDLI: Size estimation of non-cooperative data collections

Please wait, while we are loading the content...

Developing a hybrid system for sand and dust storm detection using satellite imaging and WSNs

Context-awareness: exploring the imperative shared context of security and ubiquitous computing

Style-based similarity search for office XML documents

Social networks: the role of users' privacy concerns

Document classification based on web search hit counts

A proposal of storage scheme for supply chain management

Test collection recycling for semantic text similarity

XPath fragments on XML in columns

Query expansion based on ontology for Vietnamese query

Extracting lack of information on Wikipedia by comparing multilingual articles

10 years mobile multimedia: from Motorola RAZR to iPhone 5

The dicode workbench: a flexible framework for the integration of information and web services

Digital forensics for enterprise rights management systems

Toward a taxonomy of concepts using web documents structure

Automated Twitter data collecting tool and case study with rule-based analysis

A study of opinion mining and visualization of hotel reviews

A concept of web-based energy data quality assurance and control system

Towards a semantic-driven automatic staging area design for heterogeneous data integration

Towards an information quality approach to enhance query routing processes

Addressing OWL ontology for goal consistency checking

The UITK: towards the designing a ubiquitous soft keyboard for disabled people

Evolution of data management systems: from uni-processor to large-scale distributed systems

A digital storytelling tool for Arab children

Pad and Chaff: secure approximate string matching in private record linkage

Fast and incremental indexing in effective and efficient XML element retrieval systems

Extracting tip information from social media

Size estimation of non-cooperative data collections

INMOTOS: extending the ROPE-methodology

Peer-to-peer orchestration of web mashups

Consensus building analysis using entropy in BBS tree

Ontology-based integration of clinical documents

A concept identification method for Vietnamese concept-based information retrieval system

Innovations in modeling web applications

Using information quality for the identification of relevant web data sources: a proposal

Anonymous spatial query on non-uniform data

Towards big linked data: a large-scale, distributed semantic data storage

Community building based on semantic time series

A data analytics application assessing pavement deflection factors for a road network

Enhanced security in internet voting protocol using blind signatures and dynamic ballots

Failure recovery of world-altering composite semantic services - a two phase approach

Imputation of missing values for semi-supervised data using the proximity in random forests

A generic negotiation and re-negotiation framework for consumer-provider contracting of web services

Using graph theory to re-verify the small world theory in an online social network word

Is security an afterthought when designing apps?

Model matching for Web Services on context dependencies

Dynamic time warping in hardware

NDT-merge: a future tool for conciliating software requirements in MDE environments

Various approaches to text representation for named entity disambiguation

A scrum-based approach to CMMI maturity level 2 in web development environments

Government services integration based on cloud technology

Makhtota+: enhancing old Arabic manuscripts with linked data

Revising word lattice using support vector machine for Chinese word segmentation

An initial comparative study of Arabic speech synthesis engines in iOS and Android

The human side of video streaming services

Adoption of the cloud business model in Indonesia: triggers, benefits, and challenges

A learning system for audio compression

A context-related vocabulary trainer in the integrated intelligent computer-assisted language learning (iiCALL) environment

Exploring the problems of sentiment analysis in informal Arabic

Information security in scale free networks

Time-dependent genre recognition by means of instantaneous frequency spectrum based on Hilbert-Huang transform

Feature words that classify problem sentence in scientific article

Modelling adaptations requirements in web workflows

Personalization in tag ontology learning for recommendation making

A pragmatic approach for sustainable development based on semantic web services

Using the mOSAIC's semantic engine to design and develop civil engineering cloud applications

Direct data transfer between SOAP web services in orchestration

Design of an ontology for humin substances

Size estimation of non-cooperative data collections

Content Provider	ACM Digital Library
Author	van Keulen, Maurice Khelghati, Mohammadreza Hiemstra, Djoerd
Abstract	With the increasing amount of data in deep web sources (hidden from general search engines behind web forms), accessing this data has gained more attention. In the algorithms applied for this purpose, it is the knowledge of a data source size that enables the algorithms to make accurate decisions in stopping the crawling or sampling processes which can be so costly in some cases [14]. This tendency to know the sizes of data sources is increased by the competition among businesses on the Web in which the data coverage is critical. In the context of quality assessment of search engines [7], search engine selection in the federated search engines, and in the resource/collection selection in the distributed search field [19], this information is also helpful. In addition, it can give an insight over some useful statistics for public sectors like governments. In any of these mentioned scenarios, in the case of facing a non-cooperative collection which does not publish its information, the size has to be estimated [17]. In this paper, the suggested approaches for this purpose in the literature are categorized and reviewed. The most recent approaches are implemented and compared in a real environment. Finally, four methods based on the modification of the available techniques are introduced and evaluated. In one of the modifications, the estimations from other approaches could be improved ranging from 35 to 65 percent.
Starting Page	239
Ending Page	246
Page Count	8
File Format	PDF
ISBN	9781450313063
DOI	10.1145/2428736.2428774
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2012-12-03
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Estimation bias Query-based sampling Deep web Stochastic simulation Pool-based size estimation Regression equations Size estimation
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in