Changelog
Source:NEWS.md
samplyr 0.5.9999
Initial release.
Core grammar
- Frame-independent design specification with five verbs and one modifier:
sampling_design(),add_stage(),stratify_by(),cluster_by(),draw(), andexecute(). - Designs are reusable across different frames.
Sampling methods
- 13 methods in three families:
- Equal probability:
srswor,srswr,systematic,bernoulli. - PPS without replacement:
pps_systematic,pps_brewer,pps_cps(maximum entropy),pps_poisson,pps_sps,pps_pareto. - PPS with replacement / PMR:
pps_multinomial,pps_chromy.
- Equal probability:
- Balanced sampling via the cube method (
method = "balanced") with optional auxiliary balancing variables and measure of size. Stratified designs use the stratified cube algorithm. Supported for up to 2 stages. - Permanent random numbers (PRN) for sample coordination:
bernoulli,pps_poisson,pps_sps,pps_pareto. - Random-size methods (
bernoulli,pps_poisson) acceptn(expected size) orfrac(sampling fraction).
Stratification and allocation
- Five allocation methods via
stratify_by(..., alloc =): proportional, equal, Neyman, optimal, and power. - Custom allocation via named vectors or data frames.
- Minimum and maximum sample size constraints per stratum (
min_n,max_n).
Multi-stage and multi-phase
- Multi-stage sampling with
add_stage(). Weights compound automatically across stages. - Partial execution via
execute(..., stages = 1)for operational workflows. - Two-phase sampling by piping a
tbl_sampleintoexecute().
Certainty selection
- PPS WOR methods support certainty selection via absolute (
certainty_size) or proportional (certainty_prop) thresholds, including iterative identification for proportional thresholds. -
certainty_overflow = "allow"returns all certainty units when they exceedn. - Stratum-specific thresholds via data frames.
Panel partitioning
-
execute(..., panels = k)assigns units tokpanels via systematic within-stratum interleaving. - Multi-stage designs assign panels at PSU level and propagate to all units.
Control sorting
-
control = c(var1, var2)for nested sorting. -
control = serp(var1, var2)for serpentine (alternating direction) sorting.
Survey export
-
as_svydesign()convertstbl_sampletosurvey::svydesign()with correct strata, cluster IDs, weights, and finite population corrections. Handles PPS WOR (Brewer approximation or exactppsmat), WR/PMR (InfFPC, Hansen-Hurwitz), certainty strata, balanced sampling, and two-phase designs. -
as_svrepdesign()converts to replicate-weight designs. For PPS and balanced designs,"subbootstrap"and"mrbbootstrap"are supported. -
as_survey_design()andas_survey_rep()for direct conversion to srvyrtbl_svyobjects. -
joint_expectation()computes pairwise joint inclusion probabilities (WOR) or joint expected hits (WR/PMR) for exact variance estimation.
Survey planning
-
design_effect()andeffective_n()withtbl_samplemethods. Five methods: Kish, Henry, Spencer, Chen-Rust, and cluster planning. Auto-extraction of strata, clusters, and selection probabilities from the stored design. -
draw()acceptssvyplansample size objects (svyplan_n,svyplan_power,svyplan_cluster) directly. - Precision analysis (
prec_prop(),prec_mean(),prec_cluster(),prec_multi()), sensitivity analysis (predict()), response rate adjustment (resp_rate), and confidence intervals (confint()) on all planning objects.
Diagnostics
-
summary()shows per-stage stratum allocation tables with N_h, n_h, f_h, and weight diagnostics (Kish DEFF, n_eff, CV). -
validate_frame()checks for missing variables, NA values in key columns, and MOS/PRN/auxiliary variable issues before execution.
Datasets
-
bfa_eas: 14,900 enumeration areas from Burkina Faso (LSMS/HBS style). Companion tablesbfa_eas_varianceandbfa_eas_costfor Neyman and optimal allocation. -
zwe_easandzwe_households: DHS-style two-stage cluster frame from Zimbabwe (22,600 EAs and 379,326 households). -
ken_enterprises: 6,823 establishments from Kenya for enterprise surveys, panel partitioning, and PRN coordination examples.
Vignettes
- Introduction: full tutorial covering SRS through multi-stage PPS designs.
- Design semantics: assumptions, weight formulas, and method properties.
- Survey analysis: export to survey/srvyr, joint probabilities, two-phase.
- Sampling coordination: PRN workflows, positive/negative coordination.
- Survey planning: svyplan integration, sample size, precision, design effects.
- Validation: deterministic invariants and Monte Carlo coverage checks on synthetic populations.