Skip to contents

execute() runs a sampling design against one or more data frames, producing a sampled dataset with appropriate weights and metadata.

Usage

execute(.data, ..., stages = NULL, seed = NULL, panels = NULL, reps = NULL)

Arguments

.data

A sampling_design object, or a tbl_sample object for continuation (multi-phase or multi-stage with separate frames).

...

Data frame(s) to sample from. For single-stage designs, provide one frame. For multi-stage designs with separate frames, provide frames in stage order.

stages

Integer vector specifying which stage(s) to execute. Default (NULL) executes all remaining stages.

seed

Integer random seed for reproducibility.

panels

Integer number of rotation groups (panels) to partition the sample into. Each panel is a representative subsample created by systematic interleaving within strata. The output includes a .panel column with values 1 through panels. Default NULL means no panel partitioning. Cannot be used together with reps.

reps

Integer number of independent replicate samples to draw (>= 2), or NULL (default) for a single sample. When specified, execute() draws reps independent samples from the same frame under the same design and returns a single stacked tbl_sample with a .replicate column (integer 1 through reps). Replicate r uses seed seed + r - 1. Cannot be combined with panels or with stages that use permanent random numbers.

This is repeated sample realization (drawing multiple independent samples), not replicate-weight variance estimation. For the latter, see as_svrepdesign().

Value

A tbl_sample object (a data frame subclass with sampling metadata). Contains the selected units plus:

  • .sample_id: Unique identifier for each sampled unit

  • .weight: Sampling weight (1/probability)

  • .weight_1, .weight_2, ...: Per-stage sampling weights (\(1/\pi_i^{(k)}\)). The product of all per-stage weights equals .weight.

  • .fpc_1, .fpc_2, ...: Per-stage finite population correction values. The meaning depends on the method and context:

    • Equal-probability WOR (srswor, systematic): \(N_h\) (stratum population size), or \(N\) if unstratified. The sampling fraction \(f = n / N\) is derived from this at variance-estimation time.

    • PPS WOR (pps_brewer, pps_cps, etc.): \(N_h\) (stratum population size), converted to \(\pi_i = 1/w_i\) at survey export, because survey::svydesign() expects inclusion probabilities for unequal-probability stages.

    • Clustered stages: the number of clusters in the stratum/group, not the number of ultimate units.

    • WR / PMR (srswr, pps_multinomial, pps_chromy): \(\infty\). With-replacement designs have no finite population correction; variance is estimated via the Hansen–Hurwitz formula. In a multi-stage design, each stage has its own .fpc_k. At survey export (as_svydesign()), these are assembled into a multi-level FPC formula (e.g., ~ .fpc_1 + .fpc_2).

  • .draw_1, .draw_2, ...: Draw index per stage (WR/PMR methods only). Each row represents one independent draw; the draw index identifies which with-replacement selection the row came from.

  • .certainty_1, .certainty_2, ...: Whether each unit was a certainty selection (PPS methods with certainty thresholds only)

  • .replicate: Replicate identifier (only when reps is specified)

  • .panel: Panel assignment (only when panels is specified)

  • Stage and stratum identifiers as appropriate

Details

Execution Patterns

Single-Stage Execution


design |> execute(frame, seed = 1)

Multi-Stage with Single Frame

For hierarchical data where all stages are in one frame:


design |> execute(frame, seed = 2025)

The frame must contain all clustering variables and represent the stage hierarchy correctly. Lower-stage IDs may repeat across different parents; samplyr resolves them using the full ancestry from earlier stages.

Multi-Stage with Multiple Frames

When each stage has its own frame:


design |> execute(frame1, frame2, frame3, seed = 424)

Frames are matched to stages by position.

Partial Execution (Operational Sampling)

Execute only specific stages:


selected_eas <- design |> execute(ea_frame, stages = 1, seed = 42)
# ... fieldwork: listing in selected EAs ...
sample <- selected_eas |> execute(listing_frame, seed = 43)

When the listing frame is derived from a tbl_sample (e.g. via tidyr::uncount() or dplyr::slice()), it may carry internal columns (.weight, .fpc_1, etc.) from the earlier stage. These are automatically stripped before sampling so they do not collide with the metadata carried by the stage-1 result.

Multi-Phase (Continuation)

When .data is a tbl_sample, sampling continues from that sample:


phase1 <- design1 |> execute(frame, seed = 42)
# ... add screening data to phase1 ...
phase2 <- design2 |> execute(phase1_updated, seed = 123)

Weights compound automatically in multi-phase designs.

Weight Calculation

The .weight column is always the inverse of the inclusion probability. For all methods the per-stage weight is \(w_i^{(k)} = 1 / \pi_i^{(k)}\):

  • SRS: \(w_i = N / n\), constant for all units.

  • Stratified SRS: \(w_i = N_h / n_h\) within stratum \(h\).

  • PPS WOR: \(w_i = 1 / \pi_i\) where \(\pi_i\) is computed from the measure of size by sondage::inclusion_prob(). Varies across units.

  • WR / PMR: \(w_i = 1 / E(n_i)\) where \(E(n_i) = n \cdot p_i\) is the expected number of selections. Each draw is one row; a unit selected \(k\) times appears \(k\) times, each with the same weight.

Multi-stage weight compounding

In a \(K\)-stage design, the overall weight for unit \(i\) is the product of per-stage weights: $$w_i = \prod_{k=1}^{K} w_i^{(k)} = \prod_{k=1}^{K} \frac{1}{\pi_i^{(k \mid S^{(k-1)})}}$$ where \(\pi_i^{(k \mid S^{(k-1)})}\) is the conditional inclusion probability at stage \(k\), given the set of clusters selected at all prior stages. For example, in a two-stage design where 5 of 30 EAs are selected in a region (stage 1) and 12 of 50 households are listed within each selected EA (stage 2): $$w_i = \frac{30}{5} \times \frac{50}{12} = 6 \times 4.17 = 25$$ The .weight column always equals the product of .weight_1, .weight_2, etc. Per-stage weights are preserved for diagnostics and for survey export.

Multi-phase weight compounding

When .data is itself a tbl_sample (two-phase sampling), the phase-1 inclusion probability is already reflected in the input weights. The final .weight is the product of phase-1 and phase-2 weights: $$w_i = w_i^{(\text{phase 1})} \times w_i^{(\text{phase 2} \mid \text{phase 1})}$$ This ensures the Horvitz–Thompson estimator \(\hat{Y} = \sum_S w_i \, y_i\) is unbiased for the population total.

Panel Partitioning

When panels is specified, the sample is partitioned into non-overlapping rotation groups suitable for rotating panel surveys. Each panel is a representative subsample created by systematic interleaving within strata.

Assignment is deterministic (not random): within each stratum, units are assigned round-robin to panels 1, 2, ..., k. This ensures each panel has approximately equal representation from every stratum. The quality of panel balance benefits from control sorting in draw(), which determines the order of units before interleaving.

For multi-stage designs, panels are assigned at stage 1 (PSU level). All units within a PSU inherit the PSU's panel assignment.

Weights are not adjusted for panel membership. They reflect the full-sample inclusion probability. When analysing a single panel, multiply weights by panels to obtain per-panel weights.

See also

sampling_design() for creating designs, is_tbl_sample() for testing results, get_design() for extracting metadata

Examples

# Basic SRS execution
sample <- sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1234)
sample
#> # A tbl_sample: 100 × 17
#> # Weights:      445.7 [445.7, 445.7]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1  6473 Sahel       Soum     Koutou… Rural              133         17    19.4 
#>  2  6543 Sahel       Soum     Koutou… Rural              183         23     6.15
#>  3 10735 Sahel       Yagha    Tankou… Rural               36          5    16.7 
#>  4 36151 Centre-Nord Namente… Boulsa  Rural              526         73     3.75
#>  5 10510 Nord        Loroum   Solle   Rural              239         28     5.45
#>  6 23624 Nord        Yatenga  Sengue… Rural              555         59     8.84
#>  7 33774 Centre-Nord Sanmate… Pissila Rural              857        103     4.94
#>  8  1118 Centre-Nord Namente… Boala   Rural              830        114    10.2 
#>  9 29691 Centre-Oue… Sissili  Bieha   Rural              190         30     8.83
#> 10 10559 Boucle du … Kossi    Sono    Rural               69         10     8.89
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified execution with proportional allocation
sample <- sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 300) |>
  execute(bfa_eas, seed = 5789)
table(sample$region)
#> 
#> Boucle du Mouhoun          Cascades            Centre        Centre-Est 
#>                34                17                26                20 
#>       Centre-Nord      Centre-Ouest        Centre-Sud               Est 
#>                23                25                11                37 
#>     Hauts-Bassins              Nord   Plateau-Central             Sahel 
#>                32                20                11                28 
#>         Sud-Ouest 
#>                16 

# Two-stage cluster sample execution
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sample <- sampling_design() |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 3)
length(unique(sample$district))  # 20 districts selected
#> [1] 20

# Partial execution: stage 1 only
design <- sampling_design() |>
  add_stage(label = "EAs") |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 5, method = "pps_brewer", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 12)

# Execute only stage 1 to get selected EAs
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 2)
nrow(selected_eas)  # Number of selected EAs
#> [1] 65

# Replicated sampling: 5 independent draws
sample <- sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 42, reps = 5)
table(sample$.replicate)  # 100 per replicate
#> 
#>   1   2   3   4   5 
#> 100 100 100 100 100 

# Rotating panel: 4 rotation groups
sample <- sampling_design() |>
  stratify_by(region) |>
  draw(n = 200) |>
  execute(bfa_eas, seed = 1, panels = 4)
table(sample$.panel)  # ~50 per panel
#> 
#>   1   2   3   4 
#> 650 650 650 650