Changelog
Source:NEWS.md
sondage 0.7.0
Initial CRAN release.
Sampling dispatchers
Five dispatchers returning S3 design objects with class c("<prob_class>", "<replacement>", "sondage_sample"):
-
equal_prob_wor(N, n, method=)- Simple random sampling, systematic, Bernoulli (random size). -
equal_prob_wr(N, n, method=)- Simple random sampling with replacement. -
unequal_prob_wor(pik, method=)- Conditional Poisson / maximum entropy (CPS), Brewer, systematic PPS, Poisson (random size), Sequential Poisson (SPS), Pareto. -
unequal_prob_wr(hits, method=)- Chromy’s minimum replacement, multinomial PPS. -
balanced_wor(pik, aux, strata, method=)- Cube method for balanced sampling (Deville & Tille, 2004), with optional stratification (Chauvet & Tille, 2006; Chauvet, 2009).
Design queries
Five generics for variance estimation quantities:
-
inclusion_prob()- Compute from size measures or extract from a WOR design. -
expected_hits()- Compute from size measures or extract from a WR design. -
joint_inclusion_prob()- N x N matrix of joint inclusion probabilities (WOR). Exact for CPS, systematic, Poisson, SRS, Bernoulli; high-entropy approximation (Brewer & Donadio, 2003) for Brewer, SPS, Pareto, cube.sampled_only = TRUEreturns only the n x n submatrix for sampled units. -
joint_expected_hits()- N x N matrix of pairwise expectations E(n_i n_j) (WR). Exact analytic for multinomial and SRS and simulation-based for Chromy (nsimparameter).sampled_only = TRUEreturns only the submatrix for selected units. -
sampling_cov()- Sampling covariance matrix.weighted = TRUEreturns Sen-Yates-Grundy check quantities.sampled_only = TRUEcomputes only the submatrix for sampled units.
Sampling methods (13 total)
| Method | Dispatcher | Fixed size | PRN | Joint probs |
|---|---|---|---|---|
| srs | equal_prob_wor | yes | no | Exact |
| systematic | equal_prob_wor | yes | no | Exact (some = 0) |
| bernoulli | equal_prob_wor | no | yes | Exact (independent) |
| srs | equal_prob_wr | yes | no | Exact |
| cps | unequal_prob_wor | yes | no | Exact |
| brewer | unequal_prob_wor | yes | no | HE approximation |
| systematic | unequal_prob_wor | yes | no | Exact (some = 0) |
| poisson | unequal_prob_wor | no | yes | Exact (independent) |
| sps | unequal_prob_wor | yes | yes | HE approximation |
| pareto | unequal_prob_wor | yes | yes | HE approximation |
| chromy | unequal_prob_wr | yes | no | Simulated |
| multinomial | unequal_prob_wr | yes | no | Exact |
| cube | balanced_wor | yes | no | HE approximation |
Features
- All functions return design objects usable with
s$sample,s$pik,s$hits, etc. - Batch sampling via
nrepargument for Monte Carlo simulations. Fixed-size designs return a matrix; random-size designs return a list. - Permanent random numbers (
prn) for sample coordination (Bernoulli, Poisson, SPS, Pareto). - C implementations for CPS calibration and sampling, Brewer’s draw-by-draw, Chromy’s sequential PPS, systematic PPS joint probabilities, high-entropy joint probabilities, cube flight/landing phases, and inclusion probability capping.
- CPS batch optimisation: the O(N^2) design matrix is computed once, then each replicate is a cheap sequential draw.
- Cube batch optimisation: C-level batch entry points reuse the workspace allocation and the A = X/pi matrix across replicates, with precomputed X/pi helpers for stratified designs.
- Cube auxiliary conditioning:
condition_aux = TRUEpre-conditionsauxby weighted centering/scaling and QR-pivot rank pruning, improving numerical stability with ill-conditioned or collinear auxiliary variables. - Stratified cube: within-stratum size constraints are preserved exactly when per-stratum
sum(pik)is close to an integer; otherwise a warning is issued and the design is marked random-size. - High-entropy joint inclusion probabilities handle certainty units (pi_k = 1) correctly and clamp values to valid bounds. A warning is issued when the marginal defect exceeds 5% of n.
- N-size guard:
joint_inclusion_prob()andjoint_expected_hits()refuse N > 10,000 to prevent accidental allocation of multi-GB dense matrices.sampled_only = TRUEbypasses this limit: all methods compute the n x n submatrix directly without allocating an N x N intermediate. -
sampling_cov(weighted = TRUE)returnsNA(notNaN) for undefined entries (zero joint probabilities or zero selection probabilities), with an informative warning. - All long-running C loops call
R_CheckUserInterrupt()so Ctrl-C works. - CPS calibration warns on non-convergence.
- Brewer’s denominator is clamped (not zeroed) for numerical safety with extreme inclusion probabilities.