add_stage() opens a new stage context in multi-stage sampling designs.
It acts as a delimiter between stages, not a wrapper – each stage's
specification follows add_stage() using the same verbs.
Details
Multi-Stage Design Structure
In multi-stage designs, sampling proceeds hierarchically:
Stage 1: Select primary sampling units (PSUs), e.g., schools
Stage 2: Within selected PSUs, select secondary units, e.g., classrooms
Stage 3+: Continue nesting as needed
Each stage can have its own:
Stratification (
stratify_by())Clustering (
cluster_by())Selection method and sample size (
draw())
Design Patterns
Pattern 1: Single-stage (no explicit add_stage()):
sampling_design() |>
stratify_by(...) |>
draw(...)Pattern 2: Multi-stage (explicit stages):
sampling_design() |>
add_stage(label = "Stage 1") |>
cluster_by(...) |>
draw(...) |>
add_stage(label = "Stage 2") |>
cluster_by(...) |>
draw(...) |>
add_stage(label = "Stage 3") |>
draw(...)Validation Rules
Each stage must end with
draw()before the nextadd_stage()orexecute()Empty stages (stage followed immediately by stage) are not allowed
The final stage doesn't need
cluster_by()(samples individuals)
Execution
Multi-stage designs can be executed:
All at once with a single frame (hierarchical data)
All at once with multiple frames (one per stage)
Stage by stage using
stages =parameter inexecute()
See execute() for details on execution patterns.
See also
sampling_design() for creating designs,
draw() for completing stages,
execute() for running multi-stage designs
Examples
# Two-stage design: districts then EAs
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sampling_design() |>
add_stage(label = "Districts") |>
cluster_by(district) |>
draw(n = 20, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 10) |>
execute(zwe_frame, seed = 123)
#> # A tbl_sample: 200 × 16
#> # Weights: 109.88 [44.69, 158.16]
#> ea_id province district urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00091 Bulawayo Bulawayo Urban 1271 383 1.43
#> 2 EA_00441 Bulawayo Bulawayo Urban 986 289 0.15
#> 3 EA_00348 Bulawayo Bulawayo Urban 1158 324 0.31
#> 4 EA_00137 Bulawayo Bulawayo Urban 917 276 0.52
#> 5 EA_00355 Bulawayo Bulawayo Urban 1506 472 0.22
#> 6 EA_00328 Bulawayo Bulawayo Urban 1326 368 0.45
#> 7 EA_00026 Bulawayo Bulawayo Urban 1230 376 0.57
#> 8 EA_00007 Bulawayo Bulawayo Urban 1328 371 0.43
#> 9 EA_00426 Bulawayo Bulawayo Urban 1418 410 0.51
#> 10 EA_00450 Bulawayo Bulawayo Urban 1392 405 0.37
#> # ℹ 190 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# Two-stage with stratification at stage 1
sampling_design() |>
add_stage(label = "Districts") |>
stratify_by(province) |>
cluster_by(district) |>
draw(n = 2, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 5) |>
execute(zwe_frame, seed = 1234)
#> Warning: Sample size capped to population in 1 stratum/strata: "Bulawayo".
#> ℹ Requested total: 20. Actual total: 19.
#> # A tbl_sample: 95 × 16
#> # Weights: 239.61 [90.6, 332.09]
#> ea_id province district urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00079 Bulawayo Bulawayo Urban 1195 357 0.5
#> 2 EA_00372 Bulawayo Bulawayo Urban 1515 441 0.34
#> 3 EA_00270 Bulawayo Bulawayo Rural 299 67 2.26
#> 4 EA_00382 Bulawayo Bulawayo Urban 1475 426 1.04
#> 5 EA_00184 Bulawayo Bulawayo Urban 1332 392 0.34
#> 6 EA_00515 Harare Chitungwiza Urban 1425 417 0.21
#> 7 EA_00457 Harare Chitungwiza Urban 1355 380 0.42
#> 8 EA_00585 Harare Chitungwiza Urban 611 169 0.06
#> 9 EA_00602 Harare Chitungwiza Urban 1247 338 0.17
#> 10 EA_00493 Harare Chitungwiza Urban 1458 410 0.25
#> # ℹ 85 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# DHS-style two-stage stratified cluster sample
sampling_design(title = "DHS-style Household Survey") |>
add_stage(label = "Enumeration Areas") |>
stratify_by(region, urban_rural) |>
cluster_by(ea_id) |>
draw(n = 3, method = "pps_brewer", mos = households) |>
add_stage(label = "Households") |>
draw(n = 20) |>
execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 69 × 20 | DHS-style Household Survey
#> # Weights: 224.47 [1.47, 1598.51]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_11100 Boucle … Bale Poura Urban 1086 147 17.9
#> 2 EA_11103 Boucle … Bale Poura Urban 1660 225 19.0
#> 3 EA_11105 Boucle … Bale Poura Urban 4313 585 7.95
#> 4 EA_14029 Boucle … Bale Yaho Rural 402 53 9.18
#> 5 EA_03137 Boucle … Mouhoun Dedoug… Rural 987 175 0.36
#> 6 EA_11550 Boucle … Mouhoun Safane Rural 1362 193 28.9
#> 7 EA_07722 Cascades Comoe Mangod… Rural 1002 120 5.18
#> 8 EA_12100 Cascades Comoe Sidera… Rural 1642 227 31.7
#> 9 EA_07558 Cascades Leraba Loumana Rural 1202 140 19.0
#> 10 EA_06300 Centre Kadiogo Komki-… Rural 1708 258 12.9
#> # ℹ 59 more rows
#> # ℹ 12 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# Partial execution: select only stage 1
design <- sampling_design() |>
add_stage(label = "EAs") |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 10, method = "pps_brewer", mos = households) |>
add_stage(label = "Households") |>
draw(n = 12)
# Execute stage 1 only
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 1)
nrow(selected_eas)
#> [1] 130