Draws samples using the maximum entropy design, also known as Conditional Poisson Sampling (CPS). This is the unique design that maximizes entropy subject to fixed inclusion probabilities.

up_maxent(pik, nrep = 1L, eps = 1e-06)

Arguments

pik

A numeric vector of inclusion probabilities. The sum should be an integer representing the desired sample size.

nrep

Number of sample replicates to draw. Default is 1.

eps

A small threshold value for boundary cases. Default is 1e-06.

Value

If nrep = 1, an integer vector of selected indices. If nrep > 1, an integer matrix with n rows and nrep columns, where each column contains the indices for one replicate.

Details

Maximum entropy sampling has several desirable properties:

  • Fixed sample size: exactly round(sum(pik)) units selected

  • Exact inclusion probabilities: \(E(I_k) = \pi_k\)

  • All joint inclusion probabilities are positive: \(\pi_{kl} > 0\)

  • Maximum entropy among all designs with fixed \(\pi_k\)

For repeated sampling (simulations), use the nrep parameter instead of a loop for much better performance. The design is computed once and reused for all replicates.

References

Tille, Y. (2006). Sampling Algorithms. Springer Series in Statistics.

Chen, S. X., Dempster, A. P., & Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.

See also

up_brewer() for Brewer's method (also has positive joint probs), up_systematic() for systematic PPS (fastest, but some joint probs = 0), inclusion_prob() for computing inclusion probabilities from size measures

Examples

pik <- c(0.2, 0.4, 0.6, 0.8)  # sum = 2

# Single sample
set.seed(42)
idx <- up_maxent(pik)
idx
#> [1] 3 4

# Select from data frame
df <- data.frame(id = 1:4, x = c(10, 20, 30, 40))
df[idx, ]
#>   id  x
#> 3  3 30
#> 4  4 40

# Multiple replicates for simulation
set.seed(42)
samples <- up_maxent(pik, nrep = 1000)
dim(samples)  # 2 x 1000 (n rows, nrep columns)
#> [1]    2 1000

# Verify inclusion probabilities
indicators <- apply(samples, 2, function(s) 1:4 %in% s)
rowMeans(indicators)  # Should be close to pik
#> [1] 0.206 0.398 0.622 0.774