Draws samples using the maximum entropy design, also known as Conditional Poisson Sampling (CPS). This is the unique design that maximizes entropy subject to fixed inclusion probabilities.

up_maxent(pik, nrep = 1L, eps = 1e-06)

Arguments

pik

A numeric vector of inclusion probabilities. The sum should be an integer representing the desired sample size.

nrep

Number of sample replicates to draw. Default is 1.

eps

A small threshold value for boundary cases. Default is 1e-06.

Value

If nrep = 1, an integer vector of selected indices. If nrep > 1, an integer matrix with n rows and nrep columns, where each column contains the indices for one replicate.

Details

Maximum entropy sampling has several desirable properties:

  • Fixed sample size: exactly round(sum(pik)) units selected

  • Exact inclusion probabilities: \(E(I_k) = \pi_k\)

  • All joint inclusion probabilities are positive: \(\pi_{kl} > 0\)

  • Maximum entropy among all designs with fixed \(\pi_k\)

The implementation uses the sequential algorithm of Chen, Dempster, and Liu (1994) as described in Tillé (2006). The q-values are computed on-the-fly to reduce memory usage from O(N*n) for the full q-matrix to O(N) working arrays.

For repeated sampling (simulations), use the nrep parameter instead of a loop for much better performance. The design is computed once and reused for all replicates.

References

Chen, S. X., Dempster, A. P., & Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.

Tillé, Y. (2006). Sampling Algorithms. Springer Series in Statistics. Chapter 6.

See also

up_brewer() for Brewer's method, up_systematic() for systematic PPS

Examples

pik <- c(0.2, 0.4, 0.6, 0.8)  # sum = 2

# Single sample
set.seed(123)
idx <- up_maxent(pik)
idx
#> [1] 3 4

# Multiple replicates for simulation
samples <- up_maxent(pik, nrep = 1000)
dim(samples)  # 2 x 1000
#> [1]    2 1000

# Verify inclusion probabilities
rowMeans(apply(samples, 2, function(s) 1:4 %in% s))  # close to pik
#> [1] 0.210 0.394 0.591 0.805