Selects a sample using Brewer's method with unequal inclusion probabilities. Implements Algorithm 6.10 from Tille's "Sampling Algorithms".

up_brewer(pik, eps = 1e-06)

Arguments

pik

A numeric vector of inclusion probabilities. The sum should be an integer representing the desired sample size n.

eps

A small threshold value. Units with pik <= eps are excluded and units with pik >= 1-eps are always included (certainty selections). Default is 1e-06.

Value

An integer vector of selected indices (1 to length(pik)).

Details

Brewer's method is a draw-by-draw procedure that selects n units with prescribed inclusion probabilities. At each draw i, unit k is selected with probability proportional to:

$$p_k \propto \pi_k \cdot \frac{(n - a) - \pi_k}{(n - a) - \pi_k (n - i + 1)}$$

where \(a = \sum \pi_\ell\) for already selected units.

Properties:

  • Fixed sample size n = round(sum(pik))

  • Exact inclusion probabilities

  • All joint inclusion probabilities \(\pi_{kl} > 0\)

  • Order invariant (result doesn't depend on unit ordering)

References

Tille, Y. (2006). Sampling Algorithms. Springer Series in Statistics.

Brewer, K.R.W. (1963). A model of systematic sampling with unequal probabilities. Australian Journal of Statistics, 5, 5-13.

See also

up_maxent() for maximum entropy/conditional poisson sampling, up_systematic() for systematic PPS sampling, inclusion_prob() for computing inclusion probabilities from size measures

Examples

pik <- c(0.2, 0.4, 0.6, 0.8)  # sum = 2

set.seed(42)
idx <- up_brewer(pik)
idx
#> [1] 4 3

# Select from data frame
df <- data.frame(id = 1:4, x = c(10, 20, 30, 40))
df[idx, ]
#>   id  x
#> 4  4 40
#> 3  3 30

# Verify inclusion probabilities
samples <- replicate(5000, up_brewer(pik))
indicators <- sapply(samples, function(s) 1:4 %in% s)
rowMeans(indicators)  # Should be close to pik
#> [1] 0.1072 0.1931 0.2992 0.4005