Draws samples using the maximum entropy design, also known as Conditional Poisson Sampling (CPS). This is the unique design that maximizes entropy subject to fixed inclusion probabilities.
up_maxent(pik, nrep = 1L, eps = 1e-06)If nrep = 1, an integer vector of selected indices.
If nrep > 1, an integer matrix with n rows and nrep columns,
where each column contains the indices for one replicate.
Maximum entropy sampling has several desirable properties:
Fixed sample size: exactly round(sum(pik)) units selected
Exact inclusion probabilities: \(E(I_k) = \pi_k\)
All joint inclusion probabilities are positive: \(\pi_{kl} > 0\)
Maximum entropy among all designs with fixed \(\pi_k\)
For repeated sampling (simulations), use the nrep parameter instead
of a loop for much better performance. The design is computed once
and reused for all replicates.
Tille, Y. (2006). Sampling Algorithms. Springer Series in Statistics.
Chen, S. X., Dempster, A. P., & Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.
up_brewer() for Brewer's method (also has positive joint probs),
up_systematic() for systematic PPS (fastest, but some joint probs = 0),
inclusion_prob() for computing inclusion probabilities from size measures
pik <- c(0.2, 0.4, 0.6, 0.8) # sum = 2
# Single sample
set.seed(42)
idx <- up_maxent(pik)
idx
#> [1] 3 4
# Select from data frame
df <- data.frame(id = 1:4, x = c(10, 20, 30, 40))
df[idx, ]
#> id x
#> 3 3 30
#> 4 4 40
# Multiple replicates for simulation
set.seed(42)
samples <- up_maxent(pik, nrep = 1000)
dim(samples) # 2 x 1000 (n rows, nrep columns)
#> [1] 2 1000
# Verify inclusion probabilities
indicators <- apply(samples, 2, function(s) 1:4 %in% s)
rowMeans(indicators) # Should be close to pik
#> [1] 0.206 0.398 0.622 0.774