Skip to contents

add_stage() opens a new stage context in multi-stage sampling designs. It acts as a delimiter between stages, not a wrapper – each stage's specification follows add_stage() using the same verbs.

Usage

add_stage(.data, label = NULL)

Arguments

.data

A sampling_design object.

label

Optional character string labeling the stage (e.g., "Schools", "Classrooms", "Students"). Used for documentation and printing.

Value

A modified sampling_design object with a new stage context.

Details

Multi-Stage Design Structure

In multi-stage designs, sampling proceeds hierarchically:

  1. Stage 1: Select primary sampling units (PSUs), e.g., schools

  2. Stage 2: Within selected PSUs, select secondary units, e.g., classrooms

  3. Stage 3+: Continue nesting as needed

Each stage can have its own:

Design Patterns

Pattern 1: Single-stage (no explicit add_stage()):


sampling_design() |>
  stratify_by(...) |>
  draw(...)

Pattern 2: Multi-stage (explicit stages):


sampling_design() |>
  add_stage(label = "Stage 1") |>
    cluster_by(...) |>
    draw(...) |>
  add_stage(label = "Stage 2") |>
    cluster_by(...) |>
    draw(...) |>
  add_stage(label = "Stage 3") |>
    draw(...)

Validation Rules

  • Each stage must end with draw() before the next add_stage() or execute()

  • Empty stages (stage followed immediately by stage) are not allowed

  • The final stage doesn't need cluster_by() (samples individuals)

Execution

Multi-stage designs can be executed:

  • All at once with a single frame (hierarchical data)

  • All at once with multiple frames (one per stage)

  • Stage by stage using stages = parameter in execute()

See execute() for details on execution patterns.

See also

sampling_design() for creating designs, draw() for completing stages, execute() for running multi-stage designs

Examples

# Two-stage design: districts then EAs
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sampling_design() |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 123)
#> # A tbl_sample: 200 × 21
#> # Weights:      571.03 [135.84, 1003.03]
#>    ea_id province district ward_pcode urban_rural population households
#>  * <int> <fct>    <fct>    <chr>      <fct>            <int>      <int>
#>  1 24938 Bulawayo Bulawayo ZW102119   Urban              597        158
#>  2 22852 Bulawayo Bulawayo ZW102109   Urban              409        110
#>  3  1389 Bulawayo Bulawayo ZW102105   Urban              141         42
#>  4 23753 Bulawayo Bulawayo ZW102118   Urban              444        124
#>  5 48254 Bulawayo Bulawayo ZW102106   Urban               93         26
#>  6 46344 Bulawayo Bulawayo ZW102128   Urban              281         74
#>  7 48495 Bulawayo Bulawayo ZW102126   Urban              363         94
#>  8 47192 Bulawayo Bulawayo ZW102127   Urban              290         76
#>  9 47415 Bulawayo Bulawayo ZW102121   Urban              706        181
#> 10 47532 Bulawayo Bulawayo ZW102124   Urban              679        180
#> # ℹ 190 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> #   children_under5 <int>, area_km2 <dbl>, district_hh <int>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Two-stage with stratification at stage 1
sampling_design() |>
  add_stage(label = "Districts") |>
    stratify_by(province) |>
    cluster_by(district) |>
    draw(n = 2, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 5) |>
  execute(zwe_frame, seed = 1234)
#> Warning: Sample size capped to population in 1 stratum/strata: "Bulawayo".
#>  Requested total: 20. Actual total: 19.
#> # A tbl_sample: 95 × 21
#> # Weights:      1350.78 [372.1, 2351.62]
#>    ea_id province district    ward_pcode urban_rural population households
#>  * <int> <fct>    <fct>       <chr>      <fct>            <int>      <int>
#>  1 24926 Bulawayo Bulawayo    ZW102119   Urban              253         67
#>  2  1311 Bulawayo Bulawayo    ZW102105   Urban              109         32
#>  3 48293 Bulawayo Bulawayo    ZW102115   Urban              380         97
#>  4 47427 Bulawayo Bulawayo    ZW102121   Urban              696        179
#>  5 46195 Bulawayo Bulawayo    ZW102128   Urban              691        181
#>  6 34539 Harare   Chitungwiza ZW192223   Urban              437        112
#>  7 89666 Harare   Chitungwiza ZW192210   Urban             1797        456
#>  8 89634 Harare   Chitungwiza ZW192202   Urban              456        120
#>  9 89760 Harare   Chitungwiza ZW192215   Urban              585        158
#> 10 89789 Harare   Chitungwiza ZW192216   Urban              398        102
#> # ℹ 85 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> #   children_under5 <int>, area_km2 <dbl>, district_hh <int>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Two-stage stratified cluster sample
sampling_design(title = "Household Survey") |>
  add_stage(label = "Enumeration Areas") |>
    stratify_by(region, urban_rural) |>
    cluster_by(ea_id) |>
    draw(n = 3, method = "pps_brewer", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 20) |>
  execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 69 × 20 | Household Survey
#> # Weights:      416.75 [1.54, 2041.81]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1  9029 Boucle du … Bale     Poura   Urban              524         84     8.98
#>  2  9043 Boucle du … Bale     Poura   Urban              859        138     6.56
#>  3  9046 Boucle du … Bale     Poura   Urban             4132        664     3.25
#>  4 11727 Boucle du … Bale     Yaho    Rural              957        143     1.01
#>  5 25936 Boucle du … Mouhoun  Dedoug… Rural              952        169     0.52
#>  6 43767 Boucle du … Mouhoun  Safane  Rural             1681        241     1.24
#>  7 38036 Cascades    Comoe    Mangod… Rural              645         89    10.2 
#>  8  9983 Cascades    Comoe    Sidera… Rural              861        138     0.82
#>  9  7658 Cascades    Leraba   Nianko… Rural              888        175     5.74
#> 10 26817 Centre      Kadiogo  Komki-… Rural              713        127     9.06
#> # ℹ 59 more rows
#> # ℹ 12 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>

# Partial execution: select only stage 1
design <- sampling_design() |>
  add_stage(label = "EAs") |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 10, method = "pps_brewer", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 12)

# Execute stage 1 only
selected_eas <- execute(design, bfa_eas, stages = 1, seed = 1)
nrow(selected_eas)
#> [1] 130