Skip to contents

Creates a survey::svydesign() object from a tbl_sample, using the sampling design metadata (strata, clusters, weights, and finite population corrections) captured during execute().

Usage

as_svydesign(x, ...)

# S3 method for class 'tbl_sample'
as_svydesign(x, ..., nest = TRUE, method = NULL)

Arguments

x

A tbl_sample object produced by execute().

...

Additional arguments passed to survey::svydesign(). In particular, you can pass pps = survey::ppsmat(joint_matrix) to supply exact joint inclusion probabilities instead of the default Brewer approximation (see Details).

nest

If TRUE, relabel cluster ids to enforce nesting within strata. Passed to survey::svydesign(). Default is TRUE, which is appropriate for most complex survey designs.

method

For two-phase samples, the variance method passed to survey::twophase(). One of "full", "approx", or "simple". This argument is ignored for single-phase samples.

Value

A survey.design2 object from the survey package.

Details

The conversion maps samplyr's design specification to the arguments expected by survey::svydesign():

  • Cluster ids (ids): extracted from cluster_by() variables at each stage, assembled into a multi-level formula (e.g., ~ ea_id + hh_id). For WR/PMR stages, the .draw_k column is used as the sampling unit identifier instead (each draw is treated as an independent unit for Hansen–Hurwitz variance estimation).

  • Strata (strata): extracted from stratify_by() variables at the first executed stage only (see Multi-stage designs below).

  • Weights (weights): the .weight column – the compound weight across all stages (i.e., the product of per-stage weights \(w = \prod w_k = \prod 1/\pi_k\)). This is the inverse of the overall inclusion probability and is the correct weight for design-based point estimation (\(\hat{Y} = \sum w_i y_i\)).

  • FPC (fpc): a per-stage formula assembled from .fpc_k columns. The encoding depends on the method:

    • Equal-probability WOR: .fpc_k (the stratum population count \(N_h\)) is passed directly. The survey package derives the sampling fraction as \(f_h = n_h / N_h\).

    • PPS WOR: .fpc_k is \(N_h\), but this is not passed directly. Instead, a derived column \(1 / w_k = \pi_i\) is created and passed, because survey::svydesign() interprets FPC values in \((0, 1)\) as inclusion probabilities.

    • WR / PMR: a synthetic column filled with Inf is passed. The survey package interprets this as no finite population correction, giving Hansen–Hurwitz variance.

Multi-stage designs

For multi-stage designs, as_svydesign() maps each stage's cluster variable to a level of the ids formula and provides per-stage finite population corrections. Strata are exported from the first stage only, which is consistent with the standard "with-replacement at stage 1" variance approximation used by survey::svydesign() (Cochran 1977, ch. 11). Under this approximation, second-stage and deeper stratification affects weights (which are correctly compounded) but does not need to appear in the design object for variance estimation – the contribution of later stages is captured through the variability of first-stage unit totals.

Concretely, for a two-stage stratified-cluster design, the exported call is equivalent to:


survey::svydesign(
  ids     = ~ ea_id,         # stage-1 clusters
  strata  = ~ region,        # stage-1 strata only
  weights = ~ .weight,       # product of stage-1 and stage-2 weights
  fpc     = ~ .fpc_pi_1 + .fpc_2,  # pi_i for PPS stage, N_h for SRS stage
  data    = sample,
  nest    = TRUE
)

Variance estimation for PPS designs

For stages using PPS without replacement methods (pps_brewer, pps_systematic, pps_cps, pps_poisson, pps_sps, pps_pareto), variance is estimated by default using Brewer's approximation (pps = "brewer" in survey's terminology), which approximates the joint inclusion probabilities from the marginal inclusion probabilities. This is the approximation described by Berger (2004) and works well for most PPS designs regardless of the sampling algorithm used.

For exact variance estimation, you can compute joint inclusion probabilities using joint_expectation() and pass them via pps = survey::ppsmat(joint_matrix).

Chromy's sequential PPS method (PMR)

pps_chromy is classified as a Probability Minimum Replacement (PMR) method – neither with-replacement nor without-replacement. Each unit receives exactly \(\lfloor E(n_i) \rfloor\) or \(\lfloor E(n_i) \rfloor + 1\) hits, where \(E(n_i) = n \cdot \textrm{mos}_i / \sum \textrm{mos}\). When all expected hit counts are below 1, this reduces to WOR; otherwise large units receive multiple hits.

For variance estimation, Chromy (2009) recommends the Hansen-Hurwitz (with-replacement) approximation rather than exact pairwise expectations, which he found "quite variable." Chauvet (2019) confirmed this in simulation. Accordingly, as_svydesign() treats pps_chromy stages like with-replacement stages (no FPC, no pps argument).

Note that survey::ppsmat() is not valid for the general PMR case. The survey package reads \(\pi_i\) from the diagonal of the joint matrix, but for PMR the diagonal contains \(E(n_i^2)\), which differs from \(E(n_i)\) when units receive multiple hits. The generalized Sen-Yates-Grundy variance requires \(E(n_i) E(n_j) - E(n_i n_j)\) as the pairwise weight (Chromy 2009, eq. 5), not \(E(n_i^2) E(n_j^2) - E(n_i n_j)\).

Certainty stratum (take-all units)

For PPS without-replacement stages that use certainty selection (certainty_size or certainty_prop), units with inclusion probability \(\pi_i = 1\) are placed in a separate take-all stratum. This follows the standard practice from Cochran (1977, ch. 11) and Sarndal et al. (1992, ch. 3.5): the take-all stratum contributes zero variance (it is a census) and does not inflate the degrees of freedom for the probability stratum.

For stages using with-replacement methods (srswr, pps_multinomial), the finite population correction is omitted and the .draw_k column (sequential draw index) is used as the sampling unit identifier for Hansen-Hurwitz variance estimation.

The survey package is required but not imported – it must be installed to use this function.

References

Berger, Y.G. (2004). A Simple Variance Estimator for Unequal Probability Sampling Without Replacement. Journal of Applied Statistics, 31, 305-315.

Brewer, K.R.W. (2002). Combined Survey Sampling Inference (Weighing Basu's Elephants). Chapter 9.

Chauvet, G. (2019). Properties of Chromy's sampling procedure. arXiv:1912.10896.

Chromy, J.R. (2009). Some Generalizations of the Horvitz-Thompson Estimator. JSM Proceedings, Survey Research Methods Section.

Cochran, W.G. (1977). Sampling Techniques. 3rd edition. Wiley.

Sarndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer.

See also

execute() for producing tbl_sample objects, survey::svydesign() for the underlying function, as_survey_design.tbl_sample for converting directly to a srvyr tbl_svy, as_svrepdesign() for replicate-weight export

Examples

# Stratified sample -> survey design
sample <- sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 300) |>
  execute(bfa_eas, seed = 42)

svy <- as_svydesign(sample)
survey::svymean(~households, svy)
#>              mean     SE
#> households 206.87 4.6854

# Two-stage cluster sample with PPS first stage
sample <- sampling_design() |>
  add_stage() |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 5, method = "pps_brewer", mos = households) |>
  add_stage() |>
    draw(n = 12) |>
  execute(bfa_eas, seed = 2025)

# Default: Brewer variance approximation
svy <- as_svydesign(sample)

# Exact: compute joint probabilities from frame
jip <- joint_expectation(sample, bfa_eas, stage = 1)
svy_exact <- as_svydesign(sample, pps = survey::ppsmat(jip[[1]]))