Brewer's Method for Unequal Probability Sampling

Selects a sample using Brewer's method with unequal inclusion probabilities. Implements Algorithm 6.10 from Tille's "Sampling Algorithms".

up_brewer(pik, eps = 1e-06)

Arguments

pik: A numeric vector of inclusion probabilities. The sum should be an integer representing the desired sample size n.
eps: A small threshold value. Units with pik <= eps are excluded and units with pik >= 1-eps are always included (certainty selections). Default is 1e-06.

Value

An integer vector of selected indices (1 to length(pik)).

Details

Brewer's method is a draw-by-draw procedure that selects n units with prescribed inclusion probabilities. At each draw i, unit k is selected with probability proportional to:

$$p_k \propto \pi_k \cdot \frac{(n - a) - \pi_k}{(n - a) - \pi_k (n - i + 1)}$$

where $a = \sum \pi_\ell$ for already selected units.

Properties:

Fixed sample size n = round(sum(pik))
Exact inclusion probabilities
All joint inclusion probabilities $\pi_{kl} > 0$
Order invariant (result doesn't depend on unit ordering)

References

Tille, Y. (2006). Sampling Algorithms. Springer Series in Statistics.

Brewer, K.R.W. (1963). A model of systematic sampling with unequal probabilities. Australian Journal of Statistics, 5, 5-13.

Examples

pik <- c(0.2, 0.4, 0.6, 0.8)  # sum = 2

set.seed(42)
idx <- up_brewer(pik)
idx
#> [1] 4 3

# Select from data frame
df <- data.frame(id = 1:4, x = c(10, 20, 30, 40))
df[idx, ]
#>   id  x
#> 4  4 40
#> 3  3 30

set.seed(12)
n_sim <- 10000
counts <- integer(4)
for (i in 1:n_sim) {
 idx <- up_brewer(pik)
 counts[idx] <- counts[idx] + 1
}
counts / n_sim
#> [1] 0.2036 0.3882 0.6017 0.8065

# Verify inclusion probabilities
samples <- replicate(5000, up_brewer(pik))
indicators <- sapply(samples, function(s) 1:4 %in% s)
rowMeans(indicators)  # Should be close to pik
#> [1] 0.0986 0.2044 0.2981 0.3989