Loading...
Please wait, while we are loading the content...
Population estimation with performance guarantees (2007)
| Content Provider | CiteSeerX |
|---|---|
| Author | Orlitsky, A. Santhanam, N. P. Viswanathan, K. |
| Description | In Proceedings of IEEE Symposium on Information Theory |
| Abstract | Abstract — We estimate the population size by sampling uniformly from the population. Given an accuracy to which we need to estimate the population with a pre-specified confidence, we provide a simple stopping rule for the sampling process. I. SUMMARY Many applications such as species estimation [1], database sampling [2], and epidemiologic studies [3], [4], [5] call for estimating a population size based on a relatively small sample. We derive a simple, yet nearly optimal, stopping rule for sampling and an estimation formula for alphabet size from uniform samples taken from the population. We will consider an approach outlined for the species estimation problem by Good [6] further on in the summary. For a more complete survey of prior results obtained in the species estimation problem, see [1]. For problems in database sampling see [7], [2]. The results obtained in this paper are also related to capture-recapture problems [3], [4], [5], where the unknown population size is estimated given the number of samples that are recaptured (repetitions) when sampling randomly from the population. Here, we are interested in how many recaptures are necessary to estimate the population to a given accuracy with a specified confidence. Intuitively speaking, the more the number of recaptures, the better the population size can be estimated. Formally, in an n-element sample let m denote the number of distinct elements. Let r = n − m denote the number of repeated elements. For example, in c,g,c,s,g,c,v, there are n = 7 samples, there are m = 4 distinct elements, c,g,s, and v, and r = 7 − 4 = 3 repeated elements, one g and two c ′. In the following, n independent samples are drawn uniformly from a k-element population and M k n and R k n = n − M k n are the random number of distinct and repeated elements observed. We drop the subscripts and superscripts when there is no ambiguity. A. Good’s approach By linearity of expectations, E(M) = k 1 − |
| File Format | |
| Publisher Date | 2007-01-01 |
| Access Restriction | Open |
| Subject Keyword | Capture-recapture Problem Stopping Rule Uniform Sample Estimation Formula Sampling Process Random Number Performance Guarantee Unknown Population Size Distinct Element Simple Stopping Rule N-element Sample Prior Result Specie Estimation Problem Complete Survey Many Recapture Alphabet Size Population Size Good Approach Independent Sample Population Estimation K-element Population Summary Many Application Repeated Element Pre-specified Confidence Small Sample Epidemiologic Study |
| Content Type | Text |
| Resource Type | Proceeding |