Execute a Sampling Design

execute() runs a sampling design against one or more data frames, producing a sampled dataset with appropriate weights and metadata.

Usage

execute(.data, ..., stages = NULL, seed = NULL, panels = NULL)

Arguments

.data: A sampling_design object, or a tbl_sample object for continuation (multi-phase or multi-stage with separate frames).
...: Data frame(s) to sample from. For single-stage designs, provide one frame. For multi-stage designs with separate frames, provide frames in stage order.
stages: Integer vector specifying which stage(s) to execute. Default (NULL) executes all remaining stages.
seed: Integer random seed for reproducibility.
panels: Integer number of rotation groups (panels) to partition the sample into. Each panel is a representative subsample created by systematic interleaving within strata. The output includes a .panel column with values 1 through panels. Default NULL means no panel partitioning.

Value

A tbl_sample object (a data frame subclass with sampling metadata). Contains the selected units plus:

.sample_id: Unique identifier for each sampled unit
.weight: Sampling weight (1/probability)
.weight_1, .weight_2, ...: Per-stage sampling weights ($1/\pi_i^{(k)}$). The product of all per-stage weights equals .weight.
.fpc_1, .fpc_2, ...: Per-stage finite population correction values. The meaning depends on the method and context:
- Equal-probability WOR (srswor, systematic): $N_h$ (stratum population size), or $N$ if unstratified. The sampling fraction $f = n / N$ is derived from this at variance-estimation time.
- PPS WOR (pps_brewer, pps_cps, etc.): $N_h$ (stratum population size), converted to $\pi_i = 1/w_i$ at survey export, because survey::svydesign() expects inclusion probabilities for unequal-probability stages.
- Clustered stages: the number of clusters in the stratum/group, not the number of ultimate units.
- WR / PMR (srswr, pps_multinomial, pps_chromy): $\infty$. With-replacement designs have no finite population correction; variance is estimated via the Hansen–Hurwitz formula. In a multi-stage design, each stage has its own .fpc_k. At survey export (as_svydesign()), these are assembled into a multi-level FPC formula (e.g., ~ .fpc_1 + .fpc_2).
.draw_1, .draw_2, ...: Draw index per stage (WR/PMR methods only). Each row represents one independent draw; the draw index identifies which with-replacement selection the row came from.
.certainty_1, .certainty_2, ...: Whether each unit was a certainty selection (PPS methods with certainty thresholds only)
.panel: Panel assignment (only when panels is specified)
Stage and stratum identifiers as appropriate

Details

Execution Patterns

Single-Stage Execution


design |> execute(frame, seed = 1)

Multi-Stage with Single Frame

For hierarchical data where all stages are in one frame:


design |> execute(frame, seed = 2025)

The frame must contain all clustering variables and respect nesting.

Multi-Stage with Multiple Frames

When each stage has its own frame:


design |> execute(frame1, frame2, frame3, seed = 424)

Frames are matched to stages by position.

Partial Execution (Operational Sampling)

Execute only specific stages:


selected_eas <- design |> execute(ea_frame, stages = 1, seed = 42)
# ... fieldwork: listing in selected EAs ...
sample <- selected_eas |> execute(listing_frame, seed = 43)

Multi-Phase (Continuation)

When .data is a tbl_sample, sampling continues from that sample:


phase1 <- design1 |> execute(frame, seed = 42)
# ... add screening data to phase1 ...
phase2 <- design2 |> execute(phase1_updated, seed = 123)

Weights compound automatically in multi-phase designs.

Weight Calculation

The .weight column is always the inverse of the inclusion probability. For all methods the per-stage weight is $w_i^{(k)} = 1 / \pi_i^{(k)}$:

SRS: $w_i = N / n$, constant for all units.
Stratified SRS: $w_i = N_h / n_h$ within stratum $h$.
PPS WOR: $w_i = 1 / \pi_i$ where $\pi_i$ is computed from the measure of size by sondage::inclusion_prob(). Varies across units.
WR / PMR: $w_i = 1 / E(n_i)$ where $E(n_i) = n \cdot p_i$ is the expected number of selections. Each draw is one row; a unit selected $k$ times appears $k$ times, each with the same weight.

Multi-stage weight compounding

In a $K$-stage design, the overall weight for unit $i$ is the product of per-stage weights: $$w_i = \prod_{k=1}^{K} w_i^{(k)} = \prod_{k=1}^{K} \frac{1}{\pi_i^{(k \mid S^{(k-1)})}}$$ where $\pi_i^{(k \mid S^{(k-1)})}$ is the conditional inclusion probability at stage $k$, given the set of clusters selected at all prior stages. For example, in a two-stage design where 5 of 30 EAs are selected in a region (stage 1) and 12 of 50 households are listed within each selected EA (stage 2): $$w_i = \frac{30}{5} \times \frac{50}{12} = 6 \times 4.17 = 25$$ The .weight column always equals the product of .weight_1, .weight_2, etc. Per-stage weights are preserved for diagnostics and for survey export.

Multi-phase weight compounding

When .data is itself a tbl_sample (two-phase sampling), the phase-1 inclusion probability is already reflected in the input weights. The final .weight is the product of phase-1 and phase-2 weights: $$w_i = w_i^{(\text{phase 1})} \times w_i^{(\text{phase 2} \mid \text{phase 1})}$$ This ensures the Horvitz–Thompson estimator $\hat{Y} = \sum_S w_i \, y_i$ is unbiased for the population total.

Panel Partitioning

When panels is specified, the sample is partitioned into non-overlapping rotation groups suitable for rotating panel surveys. Each panel is a representative subsample created by systematic interleaving within strata.

Assignment is deterministic (not random): within each stratum, units are assigned round-robin to panels 1, 2, ..., k. This ensures each panel has approximately equal representation from every stratum. The quality of panel balance benefits from control sorting in draw(), which determines the order of units before interleaving.

For multi-stage designs, panels are assigned at stage 1 (PSU level). All units within a PSU inherit the PSU's panel assignment.

Weights are not adjusted for panel membership. They reflect the full-sample inclusion probability. When analysing a single panel, multiply weights by panels to obtain per-panel weights.

Examples

# Basic SRS execution
sample <- sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1234)
sample
#> # A tbl_sample: 100 × 17
#> # Weights:      149 [149, 149]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00365 Centre-… Ziro     Bakata  Rural             1393        249    22.5 
#>  2 EA_01028 Centre-… Zoundwe… Bere    Rural             1043        173    22.3 
#>  3 EA_01086 Centre-… Sissili  Bieha   Rural             1208        188    10.1 
#>  4 EA_04686 Centre-… Zoundwe… Gogo    Rural             1151        220    14.8 
#>  5 EA_07319 Centre-… Sissili  Leo     Rural              857        122     9.04
#>  6 EA_04550 Est      Komandj… Gayeri  Rural             1057        114    35.1 
#>  7 EA_03597 Boucle … Kossi    Djibas… Rural             1615        192    11.0 
#>  8 EA_07072 Hauts-B… Kenedou… Kourou… Rural             1373        163    38.7 
#>  9 EA_03213 Boucle … Mouhoun  Dedoug… Rural              821        146     0.25
#> 10 EA_08023 Plateau… Ganzour… Mogtedo Rural              816         94     8.11
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified execution with proportional allocation
sample <- sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 300) |>
  execute(bfa_eas, seed = 5789)
table(sample$region)
#> 
#> Boucle du Mouhoun          Cascades            Centre        Centre-Est 
#>                30                14                31                25 
#>       Centre-Nord      Centre-Ouest        Centre-Sud               Est 
#>                28                26                12                32 
#>     Hauts-Bassins              Nord   Plateau-Central             Sahel 
#>                30                24                15                18 
#>         Sud-Ouest 
#>                15 

# Two-stage cluster sample execution
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sample <- sampling_design() |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 3)
length(unique(sample$district))  # 20 districts selected
#> [1] 20

# Partial execution: stage 1 only
design <- sampling_design() |>
  add_stage(label = "EAs") |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 5, method = "pps_brewer", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 12)

# Execute only stage 1 to get selected EAs
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 2)
nrow(selected_eas)  # Number of selected EAs
#> [1] 65

# Rotating panel: 4 rotation groups
sample <- sampling_design() |>
  stratify_by(region) |>
  draw(n = 200) |>
  execute(bfa_eas, seed = 1, panels = 4)
table(sample$.panel)  # ~50 per panel
#> 
#>   1   2   3   4 
#> 650 650 650 650