Draws a balanced sample using the cube method (Deville & Tille, 2004). A balanced sample satisfies (approximately) the balancing equations \(\sum_{k \in S} x_k / \pi_k \approx \sum_{k \in U} x_k\) for each auxiliary variable \(x\).
Arguments
- pik
A numeric vector of inclusion probabilities (length N).
sum(pik)must be close to an integer.- aux
An optional numeric matrix (N x p) of auxiliary balancing variables. Each column defines a balancing constraint. The sample size constraint is always included automatically;
auxspecifies additional variables to balance on. WhenNULL, only the sample size is balanced (equivalent to an unbalanced fixed-size design).- strata
An optional integer vector (length N) of stratum indicators (positive integers). When provided, uses the stratified cube method (Chauvet & Tille, 2006; Chauvet, 2009) which preserves within-stratum sample sizes while balancing on
aux. Exact preservation requiressum(pik)within each stratum to be close to an integer; otherwise sizes will vary around the target and a warning is issued and the design is marked as random-size (fixed_size = FALSE), so batch replicates are returned as a list.- method
The sampling method. Currently only
"cube".- nrep
Number of replicate samples (default 1). When
nrep > 1,$sampleholds a matrix (n x nrep) for fixed-size designs, or a list of integer vectors when within-stratum sizes are not exact.- ...
Additional arguments passed to methods:
epsBoundary tolerance for deciding 0/1 (default
1e-10).condition_auxLogical; if
TRUE, pre-conditionsauxby weighted centering/scaling and QR-pivot rank pruning to improve numerical stability with ill-conditioned or collinear auxiliary variables (defaultFALSE).qr_tolTolerance for QR rank detection when
condition_aux = TRUE(defaultsqrt(.Machine$double.eps)).
Value
An object of class c("unequal_prob", "wor", "sondage_sample").
When nrep = 1, $sample is an integer vector of selected unit
indices. When nrep > 1, $sample is a matrix (n x nrep) for
fixed-size designs, or a list of integer vectors when fixed_size
is FALSE (e.g., stratified with non-integer per-stratum sizes).
Details
The cube method proceeds in two phases:
- Flight phase
Probabilities are moved toward 0 or 1 while maintaining all balancing constraints. Each step resolves at least one unit. Terminates when fewer than p+1 undecided units remain.
- Landing phase
Remaining undecided units are resolved by progressively relaxing balancing constraints, starting from the last column of
aux. Users should order auxiliary variables by importance (most important first).
The sample size constraint is always placed first (never relaxed during landing). For stratified designs, within-stratum size constraints are also placed first.
Joint inclusion probabilities are approximated via the high-entropy approximation (Brewer & Donadio, 2003), which is appropriate since the cube produces a near-maximum-entropy design.
References
Deville, J.C. and Tille, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912.
Chauvet, G. and Tille, Y. (2006). A fast algorithm for balanced sampling. Computational Statistics, 21(1), 53-62.
Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35, 115-119.
See also
unequal_prob_wor() for unbalanced designs,
inclusion_prob() to compute inclusion probabilities from size measures.
Examples
# Unequal probability balanced sample
pik <- c(0.3, 0.6, 0.2, 0.4, 0.5)
x <- matrix(c(10, 20, 15, 25, 30))
set.seed(1)
s <- balanced_wor(pik, aux = x)
s$sample
#> [1] 2 5
# Check balancing: HT estimate of aux totals vs population totals
colSums(x[s$sample, , drop = FALSE] / pik[s$sample]) - colSums(x)
#> [1] -6.666667
# Stratified balanced sample
N <- 20
pik <- rep(0.4, N)
x <- matrix(as.double(1:N), ncol = 1)
strata <- rep(1:4, each = 5)
set.seed(1)
s <- balanced_wor(pik, aux = x, strata = strata)
s$sample
#> [1] 1 2 6 8 13 15 16 19