stage() opens a new stage context in multi-stage sampling designs.
It acts as a delimiter between stages, not a wrapper—each stage's
specification follows stage() using the same verbs.
stage(.data, label = NULL)A modified sampling_design object with a new stage context.
In multi-stage designs, sampling proceeds hierarchically:
Stage 1: Select primary sampling units (PSUs), e.g., schools
Stage 2: Within selected PSUs, select secondary units, e.g., classrooms
Stage 3+: Continue nesting as needed
Each stage can have its own:
Stratification (stratify_by())
Clustering (cluster_by())
Selection method and sample size (draw())
Pattern 1: Single-stage (no explicit stage()):
sampling_design() |>
stratify_by(...) |>
draw(...)Pattern 2: Multi-stage (explicit stages):
sampling_design() |>
stage(label = "Stage 1") |>
cluster_by(...) |>
draw(...) |>
stage(label = "Stage 2") |>
cluster_by(...) |>
draw(...) |>
stage(label = "Stage 3") |>
draw(...)Each stage must end with draw() before the next stage() or execute()
Empty stages (stage followed immediately by stage) are not allowed
The final stage doesn't need cluster_by() (samples individuals)
Multi-stage designs can be executed:
All at once with a single frame (hierarchical data)
All at once with multiple frames (one per stage)
Stage by stage using stages = parameter in execute()
See execute() for details on execution patterns.
sampling_design() for creating designs,
draw() for completing stages,
execute() for running multi-stage designs
# Two-stage design: schools then students
sampling_design() |>
stage(label = "Schools") |>
cluster_by(school_id) |>
draw(n = 50, method = "pps_brewer", mos = enrollment) |>
stage(label = "Students") |>
draw(n = 20) |>
execute(tanzania_schools, seed = 123)
#> == tbl_sample ==
#> Weights: 12.16 - 191.35 (mean: 46.56 )
#>
#> # A tibble: 50 × 17
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_01_0150 Dar es Sala… Ilala Secondary Governme… 493 14
#> 2 TZ_02_0006 Dar es Sala… Kinondo… Primary Governme… 728 17
#> 3 TZ_02_0010 Dar es Sala… Kinondo… Primary Governme… 433 11
#> 4 TZ_02_0048 Dar es Sala… Kinondo… Primary Governme… 708 17
#> 5 TZ_02_0093 Dar es Sala… Kinondo… Primary Governme… 362 7
#> 6 TZ_03_0025 Dar es Sala… Temeke Primary Governme… 645 14
#> 7 TZ_03_0034 Dar es Sala… Temeke Secondary Governme… 351 8
#> 8 TZ_03_0099 Dar es Sala… Temeke Primary Governme… 641 14
#> 9 TZ_03_0125 Dar es Sala… Temeke Primary Governme… 809 22
#> 10 TZ_04_0058 Arusha Arusha … Primary Private 437 10
#> # ℹ 40 more rows
#> # ℹ 10 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Two-stage with stratification at stage 1
sampling_design() |>
stage(label = "Schools") |>
stratify_by(school_level) |>
cluster_by(school_id) |>
draw(n = 20, method = "pps_brewer", mos = enrollment) |>
stage(label = "Students") |>
draw(n = 15) |>
execute(tanzania_schools, seed = 1234)
#> == tbl_sample ==
#> Weights: 12.73 - 206.66 (mean: 52.54 )
#>
#> # A tibble: 40 × 17
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_01_0068 Dar es Sala… Ilala Primary Governme… 539 11
#> 2 TZ_01_0093 Dar es Sala… Ilala Secondary Private 545 11
#> 3 TZ_01_0124 Dar es Sala… Ilala Secondary Private 351 7
#> 4 TZ_01_0173 Dar es Sala… Ilala Primary Private 642 16
#> 5 TZ_02_0001 Dar es Sala… Kinondo… Secondary Governme… 328 9
#> 6 TZ_02_0003 Dar es Sala… Kinondo… Primary Governme… 323 8
#> 7 TZ_02_0037 Dar es Sala… Kinondo… Secondary Governme… 517 14
#> 8 TZ_02_0084 Dar es Sala… Kinondo… Primary Private 195 4
#> 9 TZ_02_0126 Dar es Sala… Kinondo… Primary Governme… 857 19
#> 10 TZ_03_0001 Dar es Sala… Temeke Primary Governme… 241 5
#> # ℹ 30 more rows
#> # ℹ 10 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# DHS-style two-stage stratified cluster sample
sampling_design(title = "DHS-style Household Survey") |>
stage(label = "Enumeration Areas") |>
stratify_by(region, strata) |>
cluster_by(ea_id) |>
draw(n = 3, method = "pps_brewer", mos = hh_count) |>
stage(label = "Households") |>
draw(n = 20) |>
execute(niger_eas, seed = 2026)
#> == tbl_sample: DHS-style Household Survey ==
#> Weights: 1.79 - 177.18 (mean: 33.11 )
#>
#> # A tibble: 48 × 14
#> ea_id region department strata hh_count pop_estimate .weight .sample_id
#> * <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Aga_01_0001 Agadez Agadez Rural 59 413 13.4 1
#> 2 Aga_01_0010 Agadez Agadez Urban 192 1344 7.18 2
#> 3 Aga_01_0011 Agadez Agadez Rural 54 378 14.7 3
#> 4 Aga_02_0012 Agadez Arlit Rural 137 959 5.79 4
#> 5 Aga_03_0010 Agadez Bilma Urban 259 1554 5.33 5
#> 6 Aga_04_0004 Agadez Tchirozér… Urban 215 1290 6.42 6
#> 7 Dif_06_0001 Diffa Mainé-Sor… Rural 102 714 15.4 7
#> 8 Dif_06_0011 Diffa Mainé-Sor… Urban 134 938 2.47 8
#> 9 Dif_06_0012 Diffa Mainé-Sor… Rural 72 504 21.8 9
#> 10 Dif_06_0013 Diffa Mainé-Sor… Urban 139 834 2.38 10
#> # ℹ 38 more rows
#> # ℹ 6 more variables: .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Partial execution: select only stage 1
design <- sampling_design() |>
stage(label = "EAs") |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 10, method = "pps_brewer", mos = hh_count) |>
stage(label = "Households") |>
draw(n = 12)
# Execute stage 1 only
selected_eas <- execute(design, niger_eas, stages = 1, seed = 1)
nrow(selected_eas)
#> [1] 80