Creates a survey::svydesign() object from a tbl_sample, using
the sampling design metadata (strata, clusters, weights, and
finite population corrections) captured during execute().
Usage
as_svydesign(x, ...)
# S3 method for class 'tbl_sample'
as_svydesign(x, ..., nest = TRUE, method = NULL)Arguments
- x
A
tbl_sampleobject produced byexecute().- ...
Additional arguments passed to
survey::svydesign(). In particular, you can passpps = survey::ppsmat(joint_matrix)to supply exact joint inclusion probabilities instead of the default Brewer approximation (see Details).- nest
If
TRUE, relabel cluster ids to enforce nesting within strata. Passed tosurvey::svydesign(). Default isTRUE, which is appropriate for most complex survey designs.- method
For two-phase samples, the variance method passed to
survey::twophase(). One of"full","approx", or"simple". This argument is ignored for single-phase samples.
Details
The conversion maps samplyr's design specification to the arguments
expected by survey::svydesign():
Cluster ids (
ids): extracted fromcluster_by()variables at each stage, assembled into a multi-level formula (e.g.,~ ea_id + hh_id). For WR/PMR stages, the.draw_kcolumn is used as the sampling unit identifier instead (each draw is treated as an independent unit for Hansen–Hurwitz variance estimation).Strata (
strata): extracted fromstratify_by()variables at the first executed stage only (see Multi-stage designs below).Weights (
weights): the.weightcolumn – the compound weight across all stages (i.e., the product of per-stage weights \(w = \prod w_k = \prod 1/\pi_k\)). This is the inverse of the overall inclusion probability and is the correct weight for design-based point estimation (\(\hat{Y} = \sum w_i y_i\)).FPC (
fpc): a per-stage formula assembled from.fpc_kcolumns. The encoding depends on the method:Equal-probability WOR:
.fpc_k(the stratum population count \(N_h\)) is passed directly. The survey package derives the sampling fraction as \(f_h = n_h / N_h\).PPS WOR:
.fpc_kis \(N_h\), but this is not passed directly. Instead, a derived column \(1 / w_k = \pi_i\) is created and passed, becausesurvey::svydesign()interprets FPC values in \((0, 1)\) as inclusion probabilities.WR / PMR: a synthetic column filled with
Infis passed. The survey package interprets this as no finite population correction, giving Hansen–Hurwitz variance.
Multi-stage designs
For multi-stage designs, as_svydesign() maps each stage's
cluster variable to a level of the ids formula and provides
per-stage finite population corrections. Strata are exported from
the first stage only, which is consistent with the standard
"with-replacement at stage 1" variance approximation used by
survey::svydesign() (Cochran 1977, ch. 11). Under this
approximation, second-stage and deeper stratification affects
weights (which are correctly compounded) but does not need to
appear in the design object for variance estimation – the
contribution of later stages is captured through the variability
of first-stage unit totals.
Concretely, for a two-stage stratified-cluster design, the exported call is equivalent to:
survey::svydesign(
ids = ~ ea_id, # stage-1 clusters
strata = ~ region, # stage-1 strata only
weights = ~ .weight, # product of stage-1 and stage-2 weights
fpc = ~ .fpc_pi_1 + .fpc_2, # pi_i for PPS stage, N_h for SRS stage
data = sample,
nest = TRUE
)Variance estimation for PPS designs
For stages using PPS without replacement methods (pps_brewer,
pps_systematic, pps_cps, pps_poisson, pps_sps, pps_pareto), variance is
estimated by default using Brewer's approximation (pps = "brewer"
in survey's terminology), which approximates the joint inclusion
probabilities from the marginal inclusion probabilities. This is
the approximation described by Berger (2004) and works well for
most PPS designs regardless of the sampling algorithm used.
For exact variance estimation, you can compute joint inclusion
probabilities using joint_expectation() and pass them via
pps = survey::ppsmat(joint_matrix).
Chromy's sequential PPS method (PMR)
pps_chromy is classified as a Probability Minimum Replacement
(PMR) method – neither with-replacement nor without-replacement.
Each unit receives exactly \(\lfloor E(n_i) \rfloor\) or
\(\lfloor E(n_i) \rfloor + 1\) hits, where
\(E(n_i) = n \cdot \textrm{mos}_i / \sum \textrm{mos}\).
When all expected hit counts are below 1, this reduces to WOR;
otherwise large units receive multiple hits.
For variance estimation, Chromy (2009) recommends the
Hansen-Hurwitz (with-replacement) approximation rather than
exact pairwise expectations, which he found "quite variable."
Chauvet (2019) confirmed this in simulation. Accordingly,
as_svydesign() treats pps_chromy stages like
with-replacement stages (no FPC, no pps argument).
Note that survey::ppsmat() is not valid for the general
PMR case. The survey package reads \(\pi_i\) from the diagonal
of the joint matrix, but for PMR the diagonal contains
\(E(n_i^2)\), which differs from \(E(n_i)\) when units
receive multiple hits. The generalized Sen-Yates-Grundy variance
requires \(E(n_i) E(n_j) - E(n_i n_j)\) as the pairwise
weight (Chromy 2009, eq. 5), not \(E(n_i^2) E(n_j^2) - E(n_i n_j)\).
Certainty stratum (take-all units)
For PPS without-replacement stages that use certainty selection
(certainty_size or certainty_prop), units with inclusion
probability \(\pi_i = 1\) are placed in a separate
take-all stratum. This follows the standard practice from
Cochran (1977, ch. 11) and Sarndal et al. (1992, ch. 3.5):
the take-all stratum contributes zero variance (it is a census)
and does not inflate the degrees of freedom for the probability
stratum.
For stages using with-replacement methods (srswr,
pps_multinomial), the finite population correction is omitted
and the .draw_k column (sequential draw index) is used as the
sampling unit identifier for Hansen-Hurwitz variance estimation.
The survey package is required but not imported – it must be
installed to use this function.
References
Berger, Y.G. (2004). A Simple Variance Estimator for Unequal Probability Sampling Without Replacement. Journal of Applied Statistics, 31, 305-315.
Brewer, K.R.W. (2002). Combined Survey Sampling Inference (Weighing Basu's Elephants). Chapter 9.
Chauvet, G. (2019). Properties of Chromy's sampling procedure. arXiv:1912.10896.
Chromy, J.R. (2009). Some Generalizations of the Horvitz-Thompson Estimator. JSM Proceedings, Survey Research Methods Section.
Cochran, W.G. (1977). Sampling Techniques. 3rd edition. Wiley.
Sarndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer.
See also
execute() for producing tbl_sample objects,
survey::svydesign() for the underlying function,
as_survey_design.tbl_sample for converting directly to a srvyr tbl_svy,
as_svrepdesign() for replicate-weight export
Examples
# Stratified sample -> survey design
sample <- sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 300) |>
execute(bfa_eas, seed = 42)
svy <- as_svydesign(sample)
survey::svymean(~households, svy)
#> mean SE
#> households 206.87 4.6854
# Two-stage cluster sample with PPS first stage
sample <- sampling_design() |>
add_stage() |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 5, method = "pps_brewer", mos = households) |>
add_stage() |>
draw(n = 12) |>
execute(bfa_eas, seed = 2025)
# Default: Brewer variance approximation
svy <- as_svydesign(sample)
# Exact: compute joint probabilities from frame
jip <- joint_expectation(sample, bfa_eas, stage = 1)
svy_exact <- as_svydesign(sample, pps = survey::ppsmat(jip[[1]]))