Skip to contents

Draws a balanced sample using the cube method (Deville & Tille, 2004). A balanced sample satisfies (approximately) the balancing equations \(\sum_{k \in S} x_k / \pi_k \approx \sum_{k \in U} x_k\) for each auxiliary variable \(x\).

Usage

balanced_wor(pik, aux = NULL, strata = NULL, method = "cube", nrep = 1L, ...)

Arguments

pik

A numeric vector of inclusion probabilities (length N). sum(pik) must be close to an integer.

aux

An optional numeric matrix (N x p) of auxiliary balancing variables. Each column defines a balancing constraint. The sample size constraint is always included automatically; aux specifies additional variables to balance on. When NULL, only the sample size is balanced (equivalent to an unbalanced fixed-size design).

strata

An optional integer vector (length N) of stratum indicators (positive integers). When provided, uses the stratified cube method (Chauvet & Tille, 2006; Chauvet, 2009) which preserves within-stratum sample sizes while balancing on aux. Exact preservation requires sum(pik) within each stratum to be close to an integer; otherwise sizes will vary around the target and a warning is issued and the design is marked as random-size (fixed_size = FALSE), so batch replicates are returned as a list.

method

The sampling method. Currently only "cube".

nrep

Number of replicate samples (default 1). When nrep > 1, $sample holds a matrix (n x nrep) for fixed-size designs, or a list of integer vectors when within-stratum sizes are not exact.

...

Additional arguments passed to methods:

eps

Boundary tolerance for deciding 0/1 (default 1e-10).

condition_aux

Logical; if TRUE, pre-conditions aux by weighted centering/scaling and QR-pivot rank pruning to improve numerical stability with ill-conditioned or collinear auxiliary variables (default FALSE).

qr_tol

Tolerance for QR rank detection when condition_aux = TRUE (default sqrt(.Machine$double.eps)).

Value

An object of class c("unequal_prob", "wor", "sondage_sample"). When nrep = 1, $sample is an integer vector of selected unit indices. When nrep > 1, $sample is a matrix (n x nrep) for fixed-size designs, or a list of integer vectors when fixed_size is FALSE (e.g., stratified with non-integer per-stratum sizes).

Details

The cube method proceeds in two phases:

Flight phase

Probabilities are moved toward 0 or 1 while maintaining all balancing constraints. Each step resolves at least one unit. Terminates when fewer than p+1 undecided units remain.

Landing phase

Remaining undecided units are resolved by progressively relaxing balancing constraints, starting from the last column of aux. Users should order auxiliary variables by importance (most important first).

The sample size constraint is always placed first (never relaxed during landing). For stratified designs, within-stratum size constraints are also placed first.

Joint inclusion probabilities are approximated via the high-entropy approximation (Brewer & Donadio, 2003), which is appropriate since the cube produces a near-maximum-entropy design.

References

Deville, J.C. and Tille, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912.

Chauvet, G. and Tille, Y. (2006). A fast algorithm for balanced sampling. Computational Statistics, 21(1), 53-62.

Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35, 115-119.

See also

unequal_prob_wor() for unbalanced designs, inclusion_prob() to compute inclusion probabilities from size measures.

Examples

# Unequal probability balanced sample
pik <- c(0.3, 0.6, 0.2, 0.4, 0.5)
x <- matrix(c(10, 20, 15, 25, 30))
set.seed(1)
s <- balanced_wor(pik, aux = x)
s$sample
#> [1] 2 5

# Check balancing: HT estimate of aux totals vs population totals
colSums(x[s$sample, , drop = FALSE] / pik[s$sample]) - colSums(x)
#> [1] -6.666667

# Stratified balanced sample
N <- 20
pik <- rep(0.4, N)
x <- matrix(as.double(1:N), ncol = 1)
strata <- rep(1:4, each = 5)
set.seed(1)
s <- balanced_wor(pik, aux = x, strata = strata)
s$sample
#> [1]  1  2  6  8 13 15 16 19