Skip to contents

execute() runs a sampling design against one or more data frames, producing a sampled dataset with appropriate weights and metadata.

Usage

execute(.data, ..., stages = NULL, seed = NULL, panels = NULL, reps = NULL)

Arguments

.data

A sampling_design object, or a tbl_sample object for continuation (multi-phase or multi-stage with separate frames).

...

Data frame(s) to sample from. For single-stage designs, provide one frame. For multi-stage designs with separate frames, provide frames in stage order.

stages

Integer vector specifying which stage(s) to execute. Default (NULL) executes all remaining stages.

seed

Integer random seed for reproducibility.

panels

Integer number of rotation groups (panels) to partition the sample into. Each panel is a representative subsample created by systematic interleaving within strata. The output includes a .panel column with values 1 through panels. Default NULL means no panel partitioning. Cannot be used together with reps.

reps

Integer number of independent replicate samples to draw (>= 2), or NULL (default) for a single sample. When specified, execute() draws reps independent samples from the same frame under the same design and returns a single stacked tbl_sample with a .replicate column (integer 1 through reps). Replicate r uses seed seed + r - 1. Cannot be combined with panels or with stages that use permanent random numbers.

This is repeated sample realization (drawing multiple independent samples), not replicate-weight variance estimation. For the latter, see as_svrepdesign().

Value

A tbl_sample object (a data frame subclass with sampling metadata). Contains the selected units plus:

  • .sample_id: Unique identifier for each sampled unit

  • .weight: Sampling weight (1/probability)

  • .weight_1, .weight_2, ...: Per-stage sampling weights (\(1/\pi_i^{(k)}\)). The product of all per-stage weights equals .weight.

  • .fpc_1, .fpc_2, ...: Per-stage finite population correction values. The meaning depends on the method and context:

    • Equal-probability WOR (srswor, systematic): \(N_h\) (stratum population size), or \(N\) if unstratified. The sampling fraction \(f = n / N\) is derived from this at variance-estimation time.

    • PPS WOR (pps_brewer, pps_cps, etc.): \(N_h\) (stratum population size), converted to \(\pi_i = 1/w_i\) at survey export, because survey::svydesign() expects inclusion probabilities for unequal-probability stages.

    • Clustered stages: the number of clusters in the stratum/group, not the number of ultimate units.

    • WR / PMR (srswr, pps_multinomial, pps_chromy): \(\infty\). With-replacement designs have no finite population correction; variance is estimated via the Hansen–Hurwitz formula. In a multi-stage design, each stage has its own .fpc_k. At survey export (as_svydesign()), these are assembled into a multi-level FPC formula (e.g., ~ .fpc_1 + .fpc_2).

  • .draw_1, .draw_2, ...: Draw index per stage (WR/PMR methods only). Each row represents one independent draw; the draw index identifies which with-replacement selection the row came from.

  • .certainty_1, .certainty_2, ...: Whether each unit was a certainty selection (PPS methods with certainty thresholds only)

  • .replicate: Replicate identifier (only when reps is specified)

  • .panel: Panel assignment (only when panels is specified)

  • Stage and stratum identifiers as appropriate

Details

Execution Patterns

Single-Stage Execution


design |> execute(frame, seed = 1)

Multi-Stage with Single Frame

For hierarchical data where all stages are in one frame:


design |> execute(frame, seed = 2025)

The frame must contain all clustering variables and represent the stage hierarchy correctly. Lower-stage IDs may repeat across different parents; samplyr resolves them using the full ancestry from earlier stages.

Multi-Stage with Multiple Frames

When each stage has its own frame:


design |> execute(frame1, frame2, frame3, seed = 424)

Frames are matched to stages by position.

Partial Execution (Operational Sampling)

Execute only specific stages:


selected_eas <- design |> execute(ea_frame, stages = 1, seed = 42)
# ... fieldwork: listing in selected EAs ...
sample <- selected_eas |> execute(listing_frame, seed = 43)

Multi-Phase (Continuation)

When .data is a tbl_sample, sampling continues from that sample:


phase1 <- design1 |> execute(frame, seed = 42)
# ... add screening data to phase1 ...
phase2 <- design2 |> execute(phase1_updated, seed = 123)

Weights compound automatically in multi-phase designs.

Weight Calculation

The .weight column is always the inverse of the inclusion probability. For all methods the per-stage weight is \(w_i^{(k)} = 1 / \pi_i^{(k)}\):

  • SRS: \(w_i = N / n\), constant for all units.

  • Stratified SRS: \(w_i = N_h / n_h\) within stratum \(h\).

  • PPS WOR: \(w_i = 1 / \pi_i\) where \(\pi_i\) is computed from the measure of size by sondage::inclusion_prob(). Varies across units.

  • WR / PMR: \(w_i = 1 / E(n_i)\) where \(E(n_i) = n \cdot p_i\) is the expected number of selections. Each draw is one row; a unit selected \(k\) times appears \(k\) times, each with the same weight.

Multi-stage weight compounding

In a \(K\)-stage design, the overall weight for unit \(i\) is the product of per-stage weights: $$w_i = \prod_{k=1}^{K} w_i^{(k)} = \prod_{k=1}^{K} \frac{1}{\pi_i^{(k \mid S^{(k-1)})}}$$ where \(\pi_i^{(k \mid S^{(k-1)})}\) is the conditional inclusion probability at stage \(k\), given the set of clusters selected at all prior stages. For example, in a two-stage design where 5 of 30 EAs are selected in a region (stage 1) and 12 of 50 households are listed within each selected EA (stage 2): $$w_i = \frac{30}{5} \times \frac{50}{12} = 6 \times 4.17 = 25$$ The .weight column always equals the product of .weight_1, .weight_2, etc. Per-stage weights are preserved for diagnostics and for survey export.

Multi-phase weight compounding

When .data is itself a tbl_sample (two-phase sampling), the phase-1 inclusion probability is already reflected in the input weights. The final .weight is the product of phase-1 and phase-2 weights: $$w_i = w_i^{(\text{phase 1})} \times w_i^{(\text{phase 2} \mid \text{phase 1})}$$ This ensures the Horvitz–Thompson estimator \(\hat{Y} = \sum_S w_i \, y_i\) is unbiased for the population total.

Panel Partitioning

When panels is specified, the sample is partitioned into non-overlapping rotation groups suitable for rotating panel surveys. Each panel is a representative subsample created by systematic interleaving within strata.

Assignment is deterministic (not random): within each stratum, units are assigned round-robin to panels 1, 2, ..., k. This ensures each panel has approximately equal representation from every stratum. The quality of panel balance benefits from control sorting in draw(), which determines the order of units before interleaving.

For multi-stage designs, panels are assigned at stage 1 (PSU level). All units within a PSU inherit the PSU's panel assignment.

Weights are not adjusted for panel membership. They reflect the full-sample inclusion probability. When analysing a single panel, multiply weights by panels to obtain per-panel weights.

See also

sampling_design() for creating designs, is_tbl_sample() for testing results, get_design() for extracting metadata

Examples

# Basic SRS execution
sample <- sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1234)
sample
#> # A tbl_sample: 100 × 17
#> # Weights:      149.34 [149.34, 149.34]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_13652 Centre-… Sissili  To      Rural             1127        205    38.8 
#>  2 EA_14760 Centre-… Nahouri  Ziou    Rural             2523        503    47.1 
#>  3 EA_14555 Centre-… Sanguie  Zawara  Rural             1071        199    23.2 
#>  4 EA_01262 Centre-… Zoundwe… Binde   Rural              871        148    10.2 
#>  5 EA_07319 Centre-… Sissili  Leo     Rural             1061        171    14.3 
#>  6 EA_00896 Est      Komandj… Bartie… Rural             1153        136   131.  
#>  7 EA_03624 Boucle … Kossi    Djibas… Rural             1615        206    11.0 
#>  8 EA_07065 Hauts-B… Kenedou… Kourin… Rural              820         98    50.6 
#>  9 EA_03213 Boucle … Mouhoun  Dedoug… Rural              821        146     0.25
#> 10 EA_08003 Plateau… Ganzour… Megue   Rural             1012        125     7.58
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified execution with proportional allocation
sample <- sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 300) |>
  execute(bfa_eas, seed = 5789)
table(sample$region)
#> 
#> Boucle du Mouhoun          Cascades            Centre        Centre-Est 
#>                30                14                31                25 
#>       Centre-Nord      Centre-Ouest        Centre-Sud               Est 
#>                28                26                12                32 
#>     Hauts-Bassins              Nord   Plateau-Central             Sahel 
#>                30                24                15                18 
#>         Sud-Ouest 
#>                15 

# Two-stage cluster sample execution
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sample <- sampling_design() |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 3)
length(unique(sample$district))  # 20 districts selected
#> [1] 20

# Partial execution: stage 1 only
design <- sampling_design() |>
  add_stage(label = "EAs") |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 5, method = "pps_brewer", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 12)

# Execute only stage 1 to get selected EAs
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 2)
nrow(selected_eas)  # Number of selected EAs
#> [1] 65

# Replicated sampling: 5 independent draws
sample <- sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 42, reps = 5)
table(sample$.replicate)  # 100 per replicate
#> 
#>   1   2   3   4   5 
#> 100 100 100 100 100 

# Rotating panel: 4 rotation groups
sample <- sampling_design() |>
  stratify_by(region) |>
  draw(n = 200) |>
  execute(bfa_eas, seed = 1, panels = 4)
table(sample$.panel)  # ~50 per panel
#> 
#>   1   2   3   4 
#> 650 650 650 650