Fast systematic sampling using a single random start. Sample size is fixed at round(sum(pik)).

up_systematic(pik, eps = 1e-06)

Arguments

pik

A numeric vector of inclusion probabilities. Should sum to an integer n (the sample size).

eps

Threshold for boundary cases. Default is 1e-06.

Value

An integer vector of selected indices.

Details

Systematic sampling is one of the fastest methods with O(N) time. A single random number determines the entire sample.

Properties:

  • Fixed sample size n = round(sum(pik))

  • Exact inclusion probabilities (on average)

  • Very fast: O(N) time

  • Some joint probabilities may be 0 (units far apart are always selected together or never)

The order of units matters. For best results, sort by an auxiliary variable first (creates implicit stratification).

References

Madow, W.G. (1949). On the theory of systematic sampling, II. Annals of Mathematical Statistics, 20, 333-354.

See also

up_brewer(), up_maxent() for methods where all joint probs > 0, systematic() for equal probability systematic sampling

Examples

pik <- c(0.2, 0.3, 0.5, 0.4, 0.6)  # sum = 2

set.seed(42)
idx <- up_systematic(pik)
idx
#> [1] 3 5

# Verify inclusion probabilities
samples <- replicate(10000, up_systematic(pik))
# Convert to indicator and compute means
indicators <- sapply(samples, function(s) 1:5 %in% s)
rowMeans(indicators)  # Should be close to pik
#> [1] 0.10510 0.14495 0.24995 0.20035 0.29965