Creates a survey::svydesign() object from a tbl_sample, using
the sampling design metadata (strata, clusters, weights, and
finite population corrections) captured during execute().
Usage
as_svydesign(x, ...)
# S3 method for class 'tbl_sample'
as_svydesign(x, ..., nest = TRUE, method = NULL)Arguments
- x
A
tbl_sampleobject produced byexecute().- ...
Additional arguments passed to
survey::svydesign(). In particular, you can passpps = survey::ppsmat(joint_matrix)to supply exact joint inclusion probabilities instead of the default Brewer approximation (see Details).- nest
If
TRUE, relabel cluster ids to enforce nesting within strata. Passed tosurvey::svydesign(). Default isTRUE, which is appropriate for most complex survey designs.- method
For two-phase samples, the variance method passed to
survey::twophase(). One of"full","approx", or"simple". This argument is ignored for single-phase samples.
Details
The conversion maps samplyr's design specification to the arguments
expected by survey::svydesign():
Cluster ids (
ids): extracted fromcluster_by()variables at each stage, assembled into a multi-level formula (e.g.,~ ea_id + hh_id). For WR/PMR stages, the.draw_kcolumn is used as the sampling unit identifier instead (each draw is treated as an independent unit for Hansen–Hurwitz variance estimation).Strata (
strata): extracted fromstratify_by()variables at the first executed stage only (see Multi-stage designs below).Weights (
weights): the.weightcolumn – the compound weight across all stages (i.e., the product of per-stage weights \(w = \prod w_k = \prod 1/\pi_k\)). This is the inverse of the overall inclusion probability and is the correct weight for design-based point estimation (\(\hat{Y} = \sum w_i y_i\)).FPC (
fpc): a per-stage formula assembled from.fpc_kcolumns. The encoding depends on the method:Equal-probability WOR:
.fpc_k(the stratum population count \(N_h\)) is passed directly. The survey package derives the sampling fraction as \(f_h = n_h / N_h\).PPS WOR:
.fpc_kis \(N_h\), but this is not passed directly. Instead, a derived column \(1 / w_k = \pi_i\) is created and passed, becausesurvey::svydesign()interprets FPC values in \((0, 1)\) as inclusion probabilities.WR / PMR: a synthetic column filled with
Infis passed. The survey package interprets this as no finite population correction, giving Hansen–Hurwitz variance.
Multi-stage designs
For multi-stage designs, as_svydesign() maps each stage's
cluster variable to a level of the ids formula and provides
per-stage finite population corrections. Strata are exported from
the first stage only, which is consistent with the standard
"with-replacement at stage 1" variance approximation used by
survey::svydesign() (Cochran 1977, ch. 11). Under this
approximation, second-stage and deeper stratification affects
weights (which are correctly compounded) but does not need to
appear in the design object for variance estimation – the
contribution of later stages is captured through the variability
of first-stage unit totals.
Concretely, for a two-stage stratified-cluster design, the exported call is equivalent to:
survey::svydesign(
ids = ~ ea_id, # stage-1 clusters
strata = ~ region, # stage-1 strata only
weights = ~ .weight, # product of stage-1 and stage-2 weights
fpc = ~ .fpc_pi_1 + .fpc_2, # pi_i for PPS stage, N_h for SRS stage
data = sample,
nest = TRUE
)Variance estimation for PPS designs
For fixed-size PPS without-replacement stages (pps_brewer,
pps_systematic, pps_cps, pps_sps, pps_pareto), variance is
estimated by default using Brewer's approximation (pps = "brewer"
in survey's terminology), which approximates the joint inclusion
probabilities from the marginal inclusion probabilities. This is
the approximation described by Berger (2004) and works well for
most PPS designs regardless of the sampling algorithm used.
For exact variance estimation, you can compute joint inclusion
probabilities using joint_expectation() and pass them via
pps = survey::ppsmat(joint_matrix).
Random-size Poisson methods
Methods bernoulli and pps_poisson select units independently
with known marginal inclusion probabilities, so the realized
sample size is random. The standard SRSWOR variance estimator
is not appropriate, and Brewer's approximation (designed for
fixed-size PPS) understates the variance. Instead, these
methods are exported with pps = survey::poisson_sampling(pi),
which produces the Horvitz-Thompson Poisson variance estimator
\(\hat V = \sum_{i \in S} (1 - \pi_i) / \pi_i^2 \cdot y_i^2\)
described in Sarndal, Swensson and Wretman (1992), section 2.8.
This applies under the following conditions.
Single-stage designs (no
cluster_by(), orcluster_by()with one row per sampled cluster) are exported withpoisson_sampling()and produce the exact Horvitz-Thompson Poisson variance.Multi-stage designs with a random-size Poisson method at stage k > 1 omit the finite-population correction at the Poisson stage (the same handling used for with-replacement methods). The Poisson stage's contribution to variance is captured through the variability of the previous stage's cluster totals, under the with-replacement at stage 1 approximation.
Multi-stage designs with a random-size Poisson method at stage 1 are not supported by
survey::svydesign(), which rejects multi-stage designs when theppsargument is set. Such designs raise an error suggestingas_svrepdesign(type = "subbootstrap").Single-stage designs that use
cluster_by()with multiple rows per sampled cluster (for example a household listing within sampled EAs) raise an error.survey::poisson_sampling()treats rows as independent and does not honour within-cluster correlation. Useas_svrepdesign(type = "subbootstrap")for these designs.
Chromy's sequential PPS method (PMR)
pps_chromy is classified as a Probability Minimum Replacement
(PMR) method – neither with-replacement nor without-replacement.
Each unit receives exactly \(\lfloor E(n_i) \rfloor\) or
\(\lfloor E(n_i) \rfloor + 1\) hits, where
\(E(n_i) = n \cdot \textrm{mos}_i / \sum \textrm{mos}\).
When all expected hit counts are below 1, this reduces to WOR;
otherwise large units receive multiple hits.
For variance estimation, Chromy (2009) recommends the
Hansen-Hurwitz (with-replacement) approximation rather than
exact pairwise expectations, which he found "quite variable."
Chauvet (2019) confirmed this in simulation. Accordingly,
as_svydesign() treats pps_chromy stages like
with-replacement stages (no FPC, no pps argument).
Note that survey::ppsmat() is not valid for the general
PMR case. The survey package reads \(\pi_i\) from the diagonal
of the joint matrix, but for PMR the diagonal contains
\(E(n_i^2)\), which differs from \(E(n_i)\) when units
receive multiple hits. The generalized Sen-Yates-Grundy variance
requires \(E(n_i) E(n_j) - E(n_i n_j)\) as the pairwise
weight (Chromy 2009, eq. 5), not \(E(n_i^2) E(n_j^2) - E(n_i n_j)\).
Certainty stratum (take-all units)
For PPS without-replacement stages that use certainty selection
(certainty_size or certainty_prop), units with inclusion
probability \(\pi_i = 1\) are placed in a separate
take-all stratum. This follows the standard practice from
Cochran (1977, ch. 11) and Sarndal et al. (1992, ch. 3.5):
the take-all stratum contributes zero variance (it is a census)
and does not inflate the degrees of freedom for the probability
stratum.
For stages using with-replacement methods (srswr,
pps_multinomial), the finite population correction is omitted
and the .draw_k column (sequential draw index) is used as the
sampling unit identifier for Hansen-Hurwitz variance estimation.
The survey package is required but not imported – it must be
installed to use this function.
References
Berger, Y.G. (2004). A Simple Variance Estimator for Unequal Probability Sampling Without Replacement. Journal of Applied Statistics, 31, 305-315.
Brewer, K.R.W. (2002). Combined Survey Sampling Inference (Weighing Basu's Elephants). Chapter 9.
Chauvet, G. (2019). Properties of Chromy's sampling procedure. arXiv:1912.10896.
Chromy, J.R. (2009). Some Generalizations of the Horvitz-Thompson Estimator. JSM Proceedings, Survey Research Methods Section.
Cochran, W.G. (1977). Sampling Techniques. 3rd edition. Wiley.
Sarndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer.
See also
execute() for producing tbl_sample objects,
survey::svydesign() for the underlying function,
as_survey_design.tbl_sample for converting directly to a srvyr tbl_svy,
as_svrepdesign() for replicate-weight export
Examples
# Stratified sample -> survey design
sample <- sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 300) |>
execute(bfa_eas, seed = 42)
svy <- as_svydesign(sample)
survey::svymean(~households, svy)
#> mean SE
#> households 71.979 3.679
# Two-stage cluster sample with PPS first stage
sample <- sampling_design() |>
add_stage() |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 5, method = "pps_brewer", mos = households) |>
add_stage() |>
draw(n = 12) |>
execute(bfa_eas, seed = 2025)
# Default: Brewer variance approximation
svy <- as_svydesign(sample)
# Exact: compute joint probabilities from frame
jip <- joint_expectation(sample, bfa_eas, stage = 1)
svy_exact <- as_svydesign(sample, pps = survey::ppsmat(jip[[1]]))