Systematic Sampling with Unequal Probabilities — up_systematic • sondage

Fast systematic sampling using a single random start. Sample size is fixed at round(sum(pik)).

up_systematic(pik, eps = 1e-06)

Arguments

pik: A numeric vector of inclusion probabilities. Should sum to an integer n (the sample size).
eps: Threshold for boundary cases. Default is 1e-06.

Value

An integer vector of selected indices.

Details

Systematic sampling is one of the fastest methods with O(N) time. A single random number determines the entire sample.

Properties:

Fixed sample size n = round(sum(pik))
Exact inclusion probabilities (on average)
Very fast: O(N) time
Some joint probabilities may be 0 (units far apart are always selected together or never)

The order of units matters. For best results, sort by an auxiliary variable first (creates implicit stratification).

References

Madow, W.G. (1949). On the theory of systematic sampling, II. Annals of Mathematical Statistics, 20, 333-354.

See also

up_brewer(), up_maxent() for methods where all joint probs > 0, systematic() for equal probability systematic sampling

Examples

pik <- c(0.2, 0.3, 0.5, 0.4, 0.6)  # sum = 2

set.seed(2)
idx <- up_systematic(pik)
idx
#> [1] 1 4

set.seed(123)
n_sim <- 10000
counts <- integer(5)
for (i in 1:n_sim) {
 idx <- up_systematic(pik)
 counts[idx] <- counts[idx] + 1
}

counts / n_sim
#> [1] 0.1998 0.3059 0.4943 0.3974 0.6026