Skip to contents

add_stage() opens a new stage context in multi-stage sampling designs. It acts as a delimiter between stages, not a wrapper – each stage's specification follows add_stage() using the same verbs.

Usage

add_stage(.data, label = NULL)

Arguments

.data

A sampling_design object.

label

Optional character string labeling the stage (e.g., "Schools", "Classrooms", "Students"). Used for documentation and printing.

Value

A modified sampling_design object with a new stage context.

Details

Multi-Stage Design Structure

In multi-stage designs, sampling proceeds hierarchically:

  1. Stage 1: Select primary sampling units (PSUs), e.g., schools

  2. Stage 2: Within selected PSUs, select secondary units, e.g., classrooms

  3. Stage 3+: Continue nesting as needed

Each stage can have its own:

Design Patterns

Pattern 1: Single-stage (no explicit add_stage()):


sampling_design() |>
  stratify_by(...) |>
  draw(...)

Pattern 2: Multi-stage (explicit stages):


sampling_design() |>
  add_stage(label = "Stage 1") |>
    cluster_by(...) |>
    draw(...) |>
  add_stage(label = "Stage 2") |>
    cluster_by(...) |>
    draw(...) |>
  add_stage(label = "Stage 3") |>
    draw(...)

Validation Rules

  • Each stage must end with draw() before the next add_stage() or execute()

  • Empty stages (stage followed immediately by stage) are not allowed

  • The final stage doesn't need cluster_by() (samples individuals)

Execution

Multi-stage designs can be executed:

  • All at once with a single frame (hierarchical data)

  • All at once with multiple frames (one per stage)

  • Stage by stage using stages = parameter in execute()

See execute() for details on execution patterns.

See also

sampling_design() for creating designs, draw() for completing stages, execute() for running multi-stage designs

Examples

# Two-stage design: districts then EAs
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sampling_design() |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 123)
#> # A tbl_sample: 200 × 16
#> # Weights:      109.88 [44.69, 158.16]
#>    ea_id    province district urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>            <int>      <int>    <dbl>
#>  1 EA_00091 Bulawayo Bulawayo Urban             1271        383     1.43
#>  2 EA_00441 Bulawayo Bulawayo Urban              986        289     0.15
#>  3 EA_00348 Bulawayo Bulawayo Urban             1158        324     0.31
#>  4 EA_00137 Bulawayo Bulawayo Urban              917        276     0.52
#>  5 EA_00355 Bulawayo Bulawayo Urban             1506        472     0.22
#>  6 EA_00328 Bulawayo Bulawayo Urban             1326        368     0.45
#>  7 EA_00026 Bulawayo Bulawayo Urban             1230        376     0.57
#>  8 EA_00007 Bulawayo Bulawayo Urban             1328        371     0.43
#>  9 EA_00426 Bulawayo Bulawayo Urban             1418        410     0.51
#> 10 EA_00450 Bulawayo Bulawayo Urban             1392        405     0.37
#> # ℹ 190 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>

# Two-stage with stratification at stage 1
sampling_design() |>
  add_stage(label = "Districts") |>
    stratify_by(province) |>
    cluster_by(district) |>
    draw(n = 2, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 5) |>
  execute(zwe_frame, seed = 1234)
#> Warning: Sample size capped to population in 1 stratum/strata: "Bulawayo".
#>  Requested total: 20. Actual total: 19.
#> # A tbl_sample: 95 × 16
#> # Weights:      239.61 [90.6, 332.09]
#>    ea_id    province district    urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>       <fct>            <int>      <int>    <dbl>
#>  1 EA_00079 Bulawayo Bulawayo    Urban             1195        357     0.5 
#>  2 EA_00372 Bulawayo Bulawayo    Urban             1515        441     0.34
#>  3 EA_00270 Bulawayo Bulawayo    Rural              299         67     2.26
#>  4 EA_00382 Bulawayo Bulawayo    Urban             1475        426     1.04
#>  5 EA_00184 Bulawayo Bulawayo    Urban             1332        392     0.34
#>  6 EA_00515 Harare   Chitungwiza Urban             1425        417     0.21
#>  7 EA_00457 Harare   Chitungwiza Urban             1355        380     0.42
#>  8 EA_00585 Harare   Chitungwiza Urban              611        169     0.06
#>  9 EA_00602 Harare   Chitungwiza Urban             1247        338     0.17
#> 10 EA_00493 Harare   Chitungwiza Urban             1458        410     0.25
#> # ℹ 85 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>

# DHS-style two-stage stratified cluster sample
sampling_design(title = "DHS-style Household Survey") |>
  add_stage(label = "Enumeration Areas") |>
    stratify_by(region, urban_rural) |>
    cluster_by(ea_id) |>
    draw(n = 3, method = "pps_brewer", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 20) |>
  execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 69 × 20 | DHS-style Household Survey
#> # Weights:      224.47 [1.47, 1598.51]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_11100 Boucle … Bale     Poura   Urban             1086        147    17.9 
#>  2 EA_11103 Boucle … Bale     Poura   Urban             1660        225    19.0 
#>  3 EA_11105 Boucle … Bale     Poura   Urban             4313        585     7.95
#>  4 EA_14029 Boucle … Bale     Yaho    Rural              402         53     9.18
#>  5 EA_03137 Boucle … Mouhoun  Dedoug… Rural              987        175     0.36
#>  6 EA_11550 Boucle … Mouhoun  Safane  Rural             1362        193    28.9 
#>  7 EA_07722 Cascades Comoe    Mangod… Rural             1002        120     5.18
#>  8 EA_12100 Cascades Comoe    Sidera… Rural             1642        227    31.7 
#>  9 EA_07558 Cascades Leraba   Loumana Rural             1202        140    19.0 
#> 10 EA_06300 Centre   Kadiogo  Komki-… Rural             1708        258    12.9 
#> # ℹ 59 more rows
#> # ℹ 12 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>

# Partial execution: select only stage 1
design <- sampling_design() |>
  add_stage(label = "EAs") |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 10, method = "pps_brewer", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 12)

# Execute stage 1 only
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 1)
nrow(selected_eas)
#> [1] 130