add_stage() opens a new stage context in multi-stage sampling designs.
It acts as a delimiter between stages, not a wrapper – each stage's
specification follows add_stage() using the same verbs.
Details
Multi-Stage Design Structure
In multi-stage designs, sampling proceeds hierarchically:
Stage 1: Select primary sampling units (PSUs), e.g., schools
Stage 2: Within selected PSUs, select secondary units, e.g., classrooms
Stage 3+: Continue nesting as needed
Each stage can have its own:
Stratification (
stratify_by())Clustering (
cluster_by())Selection method and sample size (
draw())
Design Patterns
Pattern 1: Single-stage (no explicit add_stage()):
sampling_design() |>
stratify_by(...) |>
draw(...)Pattern 2: Multi-stage (explicit stages):
sampling_design() |>
add_stage(label = "Stage 1") |>
cluster_by(...) |>
draw(...) |>
add_stage(label = "Stage 2") |>
cluster_by(...) |>
draw(...) |>
add_stage(label = "Stage 3") |>
draw(...)Validation Rules
Each stage must end with
draw()before the nextadd_stage()orexecute()Empty stages (stage followed immediately by stage) are not allowed
The final stage doesn't need
cluster_by()(samples individuals)
Execution
Multi-stage designs can be executed:
All at once with a single frame (hierarchical data)
All at once with multiple frames (one per stage)
Stage by stage using
stages =parameter inexecute()
See execute() for details on execution patterns.
See also
sampling_design() for creating designs,
draw() for completing stages,
execute() for running multi-stage designs
Examples
# Two-stage design: districts then EAs
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sampling_design() |>
add_stage(label = "Districts") |>
cluster_by(district) |>
draw(n = 20, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 10) |>
execute(zwe_frame, seed = 123)
#> # A tbl_sample: 200 × 21
#> # Weights: 571.03 [135.84, 1003.03]
#> ea_id province district ward_pcode urban_rural population households
#> * <int> <fct> <fct> <chr> <fct> <int> <int>
#> 1 24938 Bulawayo Bulawayo ZW102119 Urban 597 158
#> 2 22852 Bulawayo Bulawayo ZW102109 Urban 409 110
#> 3 1389 Bulawayo Bulawayo ZW102105 Urban 141 42
#> 4 23753 Bulawayo Bulawayo ZW102118 Urban 444 124
#> 5 48254 Bulawayo Bulawayo ZW102106 Urban 93 26
#> 6 46344 Bulawayo Bulawayo ZW102128 Urban 281 74
#> 7 48495 Bulawayo Bulawayo ZW102126 Urban 363 94
#> 8 47192 Bulawayo Bulawayo ZW102127 Urban 290 76
#> 9 47415 Bulawayo Bulawayo ZW102121 Urban 706 181
#> 10 47532 Bulawayo Bulawayo ZW102124 Urban 679 180
#> # ℹ 190 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> # children_under5 <int>, area_km2 <dbl>, district_hh <int>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Two-stage with stratification at stage 1
sampling_design() |>
add_stage(label = "Districts") |>
stratify_by(province) |>
cluster_by(district) |>
draw(n = 2, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 5) |>
execute(zwe_frame, seed = 1234)
#> Warning: Sample size capped to population in 1 stratum/strata: "Bulawayo".
#> ℹ Requested total: 20. Actual total: 19.
#> # A tbl_sample: 95 × 21
#> # Weights: 1350.78 [372.1, 2351.62]
#> ea_id province district ward_pcode urban_rural population households
#> * <int> <fct> <fct> <chr> <fct> <int> <int>
#> 1 24926 Bulawayo Bulawayo ZW102119 Urban 253 67
#> 2 1311 Bulawayo Bulawayo ZW102105 Urban 109 32
#> 3 48293 Bulawayo Bulawayo ZW102115 Urban 380 97
#> 4 47427 Bulawayo Bulawayo ZW102121 Urban 696 179
#> 5 46195 Bulawayo Bulawayo ZW102128 Urban 691 181
#> 6 34539 Harare Chitungwiza ZW192223 Urban 437 112
#> 7 89666 Harare Chitungwiza ZW192210 Urban 1797 456
#> 8 89634 Harare Chitungwiza ZW192202 Urban 456 120
#> 9 89760 Harare Chitungwiza ZW192215 Urban 585 158
#> 10 89789 Harare Chitungwiza ZW192216 Urban 398 102
#> # ℹ 85 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> # children_under5 <int>, area_km2 <dbl>, district_hh <int>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Two-stage stratified cluster sample
sampling_design(title = "Household Survey") |>
add_stage(label = "Enumeration Areas") |>
stratify_by(region, urban_rural) |>
cluster_by(ea_id) |>
draw(n = 3, method = "pps_brewer", mos = households) |>
add_stage(label = "Households") |>
draw(n = 20) |>
execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 69 × 20 | Household Survey
#> # Weights: 416.75 [1.54, 2041.81]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 9029 Boucle du … Bale Poura Urban 524 84 8.98
#> 2 9043 Boucle du … Bale Poura Urban 859 138 6.56
#> 3 9046 Boucle du … Bale Poura Urban 4132 664 3.25
#> 4 11727 Boucle du … Bale Yaho Rural 957 143 1.01
#> 5 25936 Boucle du … Mouhoun Dedoug… Rural 952 169 0.52
#> 6 43767 Boucle du … Mouhoun Safane Rural 1681 241 1.24
#> 7 38036 Cascades Comoe Mangod… Rural 645 89 10.2
#> 8 9983 Cascades Comoe Sidera… Rural 861 138 0.82
#> 9 7658 Cascades Leraba Nianko… Rural 888 175 5.74
#> 10 26817 Centre Kadiogo Komki-… Rural 713 127 9.06
#> # ℹ 59 more rows
#> # ℹ 12 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# Partial execution: select only stage 1
design <- sampling_design() |>
add_stage(label = "EAs") |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 10, method = "pps_brewer", mos = households) |>
add_stage(label = "Households") |>
draw(n = 12)
# Execute stage 1 only
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 1)
nrow(selected_eas)
#> [1] 130