execute() runs a sampling design against one or more data frames,
producing a sampled dataset with appropriate weights and metadata.
Arguments
- .data
A
sampling_designobject, or atbl_sampleobject for continuation (multi-phase or multi-stage with separate frames).- ...
Data frame(s) to sample from. For single-stage designs, provide one frame. For multi-stage designs with separate frames, provide frames in stage order.
- stages
Integer vector specifying which stage(s) to execute. Default (
NULL) executes all remaining stages.- seed
Integer random seed for reproducibility.
- panels
Integer number of rotation groups (panels) to partition the sample into. Each panel is a representative subsample created by systematic interleaving within strata. The output includes a
.panelcolumn with values 1 throughpanels. DefaultNULLmeans no panel partitioning. Cannot be used together withreps.- reps
Integer number of independent replicate samples to draw (>= 2), or
NULL(default) for a single sample. When specified,execute()drawsrepsindependent samples from the same frame under the same design and returns a single stackedtbl_samplewith a.replicatecolumn (integer 1 throughreps). Replicateruses seedseed + r - 1. Cannot be combined withpanelsor with stages that use permanent random numbers.This is repeated sample realization (drawing multiple independent samples), not replicate-weight variance estimation. For the latter, see
as_svrepdesign().
Value
A tbl_sample object (a data frame subclass with sampling
metadata). Contains the selected units plus:
.sample_id: Unique identifier for each sampled unit.weight: Sampling weight (1/probability).weight_1,.weight_2, ...: Per-stage sampling weights (\(1/\pi_i^{(k)}\)). The product of all per-stage weights equals.weight..fpc_1,.fpc_2, ...: Per-stage finite population correction values. The meaning depends on the method and context:Equal-probability WOR (srswor, systematic): \(N_h\) (stratum population size), or \(N\) if unstratified. The sampling fraction \(f = n / N\) is derived from this at variance-estimation time.
PPS WOR (pps_brewer, pps_cps, etc.): \(N_h\) (stratum population size), converted to \(\pi_i = 1/w_i\) at survey export, because
survey::svydesign()expects inclusion probabilities for unequal-probability stages.Clustered stages: the number of clusters in the stratum/group, not the number of ultimate units.
WR / PMR (srswr, pps_multinomial, pps_chromy): \(\infty\). With-replacement designs have no finite population correction; variance is estimated via the Hansen–Hurwitz formula. In a multi-stage design, each stage has its own
.fpc_k. At survey export (as_svydesign()), these are assembled into a multi-level FPC formula (e.g.,~ .fpc_1 + .fpc_2).
.draw_1,.draw_2, ...: Draw index per stage (WR/PMR methods only). Each row represents one independent draw; the draw index identifies which with-replacement selection the row came from..certainty_1,.certainty_2, ...: Whether each unit was a certainty selection (PPS methods with certainty thresholds only).replicate: Replicate identifier (only whenrepsis specified).panel: Panel assignment (only whenpanelsis specified)Stage and stratum identifiers as appropriate
Details
Execution Patterns
Single-Stage Execution
design |> execute(frame, seed = 1)Multi-Stage with Single Frame
For hierarchical data where all stages are in one frame:
design |> execute(frame, seed = 2025)The frame must contain all clustering variables and represent the stage
hierarchy correctly. Lower-stage IDs may repeat across different parents;
samplyr resolves them using the full ancestry from earlier stages.
Multi-Stage with Multiple Frames
When each stage has its own frame:
design |> execute(frame1, frame2, frame3, seed = 424)Frames are matched to stages by position.
Weight Calculation
The .weight column is always the inverse of the inclusion probability.
For all methods the per-stage weight is \(w_i^{(k)} = 1 / \pi_i^{(k)}\):
SRS: \(w_i = N / n\), constant for all units.
Stratified SRS: \(w_i = N_h / n_h\) within stratum \(h\).
PPS WOR: \(w_i = 1 / \pi_i\) where \(\pi_i\) is computed from the measure of size by
sondage::inclusion_prob(). Varies across units.WR / PMR: \(w_i = 1 / E(n_i)\) where \(E(n_i) = n \cdot p_i\) is the expected number of selections. Each draw is one row; a unit selected \(k\) times appears \(k\) times, each with the same weight.
Multi-stage weight compounding
In a \(K\)-stage design, the overall weight for unit \(i\) is the
product of per-stage weights:
$$w_i = \prod_{k=1}^{K} w_i^{(k)} = \prod_{k=1}^{K} \frac{1}{\pi_i^{(k \mid S^{(k-1)})}}$$
where \(\pi_i^{(k \mid S^{(k-1)})}\) is the
conditional inclusion probability at stage \(k\), given the set of
clusters selected at all prior stages. For example, in a two-stage design
where 5 of 30 EAs are selected in a region (stage 1) and 12 of 50
households are listed within each selected EA (stage 2):
$$w_i = \frac{30}{5} \times \frac{50}{12} = 6 \times 4.17 = 25$$
The .weight column always equals the product of .weight_1, .weight_2,
etc. Per-stage weights are preserved for diagnostics and for survey
export.
Multi-phase weight compounding
When .data is itself a tbl_sample (two-phase sampling), the
phase-1 inclusion probability is already reflected in the input weights.
The final .weight is the product of phase-1 and phase-2 weights:
$$w_i = w_i^{(\text{phase 1})} \times w_i^{(\text{phase 2} \mid \text{phase 1})}$$
This ensures the Horvitz–Thompson estimator
\(\hat{Y} = \sum_S w_i \, y_i\) is unbiased
for the population total.
Panel Partitioning
When panels is specified, the sample is partitioned into non-overlapping
rotation groups suitable for rotating panel surveys. Each panel is a
representative subsample created by systematic interleaving within strata.
Assignment is deterministic (not random): within each stratum, units are
assigned round-robin to panels 1, 2, ..., k. This ensures each panel has
approximately equal representation from every stratum. The quality of panel
balance benefits from control sorting in draw(), which determines the
order of units before interleaving.
For multi-stage designs, panels are assigned at stage 1 (PSU level). All units within a PSU inherit the PSU's panel assignment.
Weights are not adjusted for panel membership. They reflect the full-sample
inclusion probability. When analysing a single panel, multiply weights by
panels to obtain per-panel weights.
See also
sampling_design() for creating designs,
is_tbl_sample() for testing results,
get_design() for extracting metadata
Examples
# Basic SRS execution
sample <- sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 1234)
sample
#> # A tbl_sample: 100 × 17
#> # Weights: 149.34 [149.34, 149.34]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_13652 Centre-… Sissili To Rural 1127 205 38.8
#> 2 EA_14760 Centre-… Nahouri Ziou Rural 2523 503 47.1
#> 3 EA_14555 Centre-… Sanguie Zawara Rural 1071 199 23.2
#> 4 EA_01262 Centre-… Zoundwe… Binde Rural 871 148 10.2
#> 5 EA_07319 Centre-… Sissili Leo Rural 1061 171 14.3
#> 6 EA_00896 Est Komandj… Bartie… Rural 1153 136 131.
#> 7 EA_03624 Boucle … Kossi Djibas… Rural 1615 206 11.0
#> 8 EA_07065 Hauts-B… Kenedou… Kourin… Rural 820 98 50.6
#> 9 EA_03213 Boucle … Mouhoun Dedoug… Rural 821 146 0.25
#> 10 EA_08003 Plateau… Ganzour… Megue Rural 1012 125 7.58
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified execution with proportional allocation
sample <- sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 300) |>
execute(bfa_eas, seed = 5789)
table(sample$region)
#>
#> Boucle du Mouhoun Cascades Centre Centre-Est
#> 30 14 31 25
#> Centre-Nord Centre-Ouest Centre-Sud Est
#> 28 26 12 32
#> Hauts-Bassins Nord Plateau-Central Sahel
#> 30 24 15 18
#> Sud-Ouest
#> 15
# Two-stage cluster sample execution
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sample <- sampling_design() |>
add_stage(label = "Districts") |>
cluster_by(district) |>
draw(n = 20, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 10) |>
execute(zwe_frame, seed = 3)
length(unique(sample$district)) # 20 districts selected
#> [1] 20
# Partial execution: stage 1 only
design <- sampling_design() |>
add_stage(label = "EAs") |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 5, method = "pps_brewer", mos = households) |>
add_stage(label = "Households") |>
draw(n = 12)
# Execute only stage 1 to get selected EAs
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 2)
nrow(selected_eas) # Number of selected EAs
#> [1] 65
# Replicated sampling: 5 independent draws
sample <- sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 42, reps = 5)
table(sample$.replicate) # 100 per replicate
#>
#> 1 2 3 4 5
#> 100 100 100 100 100
# Rotating panel: 4 rotation groups
sample <- sampling_design() |>
stratify_by(region) |>
draw(n = 200) |>
execute(bfa_eas, seed = 1, panels = 4)
table(sample$.panel) # ~50 per panel
#>
#> 1 2 3 4
#> 650 650 650 650