execute() runs a sampling design against one or more data frames,
producing a sampled dataset with appropriate weights and metadata.
Arguments
- .data
A
sampling_designobject, or atbl_sampleobject for continuation (multi-phase or multi-stage with separate frames).- ...
Data frame(s) to sample from. For single-stage designs, provide one frame. For multi-stage designs with separate frames, provide frames in stage order.
- stages
Integer vector specifying which stage(s) to execute. Default (
NULL) executes all remaining stages.- seed
Integer random seed for reproducibility.
- panels
Integer number of rotation groups (panels) to partition the sample into. Each panel is a representative subsample created by systematic interleaving within strata. The output includes a
.panelcolumn with values 1 throughpanels. DefaultNULLmeans no panel partitioning.
Value
A tbl_sample object (a data frame subclass with sampling
metadata). Contains the selected units plus:
.sample_id: Unique identifier for each sampled unit.weight: Sampling weight (1/probability).weight_1,.weight_2, ...: Per-stage sampling weights (\(1/\pi_i^{(k)}\)). The product of all per-stage weights equals.weight..fpc_1,.fpc_2, ...: Per-stage finite population correction values. The meaning depends on the method and context:Equal-probability WOR (srswor, systematic): \(N_h\) (stratum population size), or \(N\) if unstratified. The sampling fraction \(f = n / N\) is derived from this at variance-estimation time.
PPS WOR (pps_brewer, pps_cps, etc.): \(N_h\) (stratum population size), converted to \(\pi_i = 1/w_i\) at survey export, because
survey::svydesign()expects inclusion probabilities for unequal-probability stages.Clustered stages: the number of clusters in the stratum/group, not the number of ultimate units.
WR / PMR (srswr, pps_multinomial, pps_chromy): \(\infty\). With-replacement designs have no finite population correction; variance is estimated via the Hansen–Hurwitz formula. In a multi-stage design, each stage has its own
.fpc_k. At survey export (as_svydesign()), these are assembled into a multi-level FPC formula (e.g.,~ .fpc_1 + .fpc_2).
.draw_1,.draw_2, ...: Draw index per stage (WR/PMR methods only). Each row represents one independent draw; the draw index identifies which with-replacement selection the row came from..certainty_1,.certainty_2, ...: Whether each unit was a certainty selection (PPS methods with certainty thresholds only).panel: Panel assignment (only whenpanelsis specified)Stage and stratum identifiers as appropriate
Details
Execution Patterns
Single-Stage Execution
design |> execute(frame, seed = 1)Multi-Stage with Single Frame
For hierarchical data where all stages are in one frame:
design |> execute(frame, seed = 2025)The frame must contain all clustering variables and respect nesting.
Multi-Stage with Multiple Frames
When each stage has its own frame:
design |> execute(frame1, frame2, frame3, seed = 424)Frames are matched to stages by position.
Weight Calculation
The .weight column is always the inverse of the inclusion probability.
For all methods the per-stage weight is \(w_i^{(k)} = 1 / \pi_i^{(k)}\):
SRS: \(w_i = N / n\), constant for all units.
Stratified SRS: \(w_i = N_h / n_h\) within stratum \(h\).
PPS WOR: \(w_i = 1 / \pi_i\) where \(\pi_i\) is computed from the measure of size by
sondage::inclusion_prob(). Varies across units.WR / PMR: \(w_i = 1 / E(n_i)\) where \(E(n_i) = n \cdot p_i\) is the expected number of selections. Each draw is one row; a unit selected \(k\) times appears \(k\) times, each with the same weight.
Multi-stage weight compounding
In a \(K\)-stage design, the overall weight for unit \(i\) is the
product of per-stage weights:
$$w_i = \prod_{k=1}^{K} w_i^{(k)} = \prod_{k=1}^{K} \frac{1}{\pi_i^{(k \mid S^{(k-1)})}}$$
where \(\pi_i^{(k \mid S^{(k-1)})}\) is the
conditional inclusion probability at stage \(k\), given the set of
clusters selected at all prior stages. For example, in a two-stage design
where 5 of 30 EAs are selected in a region (stage 1) and 12 of 50
households are listed within each selected EA (stage 2):
$$w_i = \frac{30}{5} \times \frac{50}{12} = 6 \times 4.17 = 25$$
The .weight column always equals the product of .weight_1, .weight_2,
etc. Per-stage weights are preserved for diagnostics and for survey
export.
Multi-phase weight compounding
When .data is itself a tbl_sample (two-phase sampling), the
phase-1 inclusion probability is already reflected in the input weights.
The final .weight is the product of phase-1 and phase-2 weights:
$$w_i = w_i^{(\text{phase 1})} \times w_i^{(\text{phase 2} \mid \text{phase 1})}$$
This ensures the Horvitz–Thompson estimator
\(\hat{Y} = \sum_S w_i \, y_i\) is unbiased
for the population total.
Panel Partitioning
When panels is specified, the sample is partitioned into non-overlapping
rotation groups suitable for rotating panel surveys. Each panel is a
representative subsample created by systematic interleaving within strata.
Assignment is deterministic (not random): within each stratum, units are
assigned round-robin to panels 1, 2, ..., k. This ensures each panel has
approximately equal representation from every stratum. The quality of panel
balance benefits from control sorting in draw(), which determines the
order of units before interleaving.
For multi-stage designs, panels are assigned at stage 1 (PSU level). All units within a PSU inherit the PSU's panel assignment.
Weights are not adjusted for panel membership. They reflect the full-sample
inclusion probability. When analysing a single panel, multiply weights by
panels to obtain per-panel weights.
See also
sampling_design() for creating designs,
is_tbl_sample() for testing results,
get_design() for extracting metadata
Examples
# Basic SRS execution
sample <- sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 1234)
sample
#> # A tbl_sample: 100 × 17
#> # Weights: 149 [149, 149]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00365 Centre-… Ziro Bakata Rural 1393 249 22.5
#> 2 EA_01028 Centre-… Zoundwe… Bere Rural 1043 173 22.3
#> 3 EA_01086 Centre-… Sissili Bieha Rural 1208 188 10.1
#> 4 EA_04686 Centre-… Zoundwe… Gogo Rural 1151 220 14.8
#> 5 EA_07319 Centre-… Sissili Leo Rural 857 122 9.04
#> 6 EA_04550 Est Komandj… Gayeri Rural 1057 114 35.1
#> 7 EA_03597 Boucle … Kossi Djibas… Rural 1615 192 11.0
#> 8 EA_07072 Hauts-B… Kenedou… Kourou… Rural 1373 163 38.7
#> 9 EA_03213 Boucle … Mouhoun Dedoug… Rural 821 146 0.25
#> 10 EA_08023 Plateau… Ganzour… Mogtedo Rural 816 94 8.11
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified execution with proportional allocation
sample <- sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 300) |>
execute(bfa_eas, seed = 5789)
table(sample$region)
#>
#> Boucle du Mouhoun Cascades Centre Centre-Est
#> 30 14 31 25
#> Centre-Nord Centre-Ouest Centre-Sud Est
#> 28 26 12 32
#> Hauts-Bassins Nord Plateau-Central Sahel
#> 30 24 15 18
#> Sud-Ouest
#> 15
# Two-stage cluster sample execution
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sample <- sampling_design() |>
add_stage(label = "Districts") |>
cluster_by(district) |>
draw(n = 20, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 10) |>
execute(zwe_frame, seed = 3)
length(unique(sample$district)) # 20 districts selected
#> [1] 20
# Partial execution: stage 1 only
design <- sampling_design() |>
add_stage(label = "EAs") |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 5, method = "pps_brewer", mos = households) |>
add_stage(label = "Households") |>
draw(n = 12)
# Execute only stage 1 to get selected EAs
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 2)
nrow(selected_eas) # Number of selected EAs
#> [1] 65
# Rotating panel: 4 rotation groups
sample <- sampling_design() |>
stratify_by(region) |>
draw(n = 200) |>
execute(bfa_eas, seed = 1, panels = 4)
table(sample$.panel) # ~50 per panel
#>
#> 1 2 3 4
#> 650 650 650 650