Define a New Stage in Multi-Stage Designs

stage() opens a new stage context in multi-stage sampling designs. It acts as a delimiter between stages, not a wrapper—each stage's specification follows stage() using the same verbs.

stage(.data, label = NULL)

Arguments

.data: A sampling_design object.
label: Optional character string labeling the stage (e.g., "Schools", "Classrooms", "Students"). Used for documentation and printing.

Value

A modified sampling_design object with a new stage context.

Details

Multi-Stage Design Structure

In multi-stage designs, sampling proceeds hierarchically:

Stage 1: Select primary sampling units (PSUs), e.g., schools
Stage 2: Within selected PSUs, select secondary units, e.g., classrooms
Stage 3+: Continue nesting as needed

Each stage can have its own:

Stratification (stratify_by())
Clustering (cluster_by())
Selection method and sample size (draw())

Design Patterns

Pattern 1: Single-stage (no explicit stage()):


sampling_design() |>
  stratify_by(...) |>
  draw(...)

Pattern 2: Multi-stage (explicit stages):


sampling_design() |>
  stage(label = "Stage 1") |>
    cluster_by(...) |>
    draw(...) |>
  stage(label = "Stage 2") |>
    cluster_by(...) |>
    draw(...) |>
  stage(label = "Stage 3") |>
    draw(...)

Validation Rules

Each stage must end with draw() before the next stage() or execute()
Empty stages (stage followed immediately by stage) are not allowed
The final stage doesn't need cluster_by() (samples individuals)

Execution

Multi-stage designs can be executed:

All at once with a single frame (hierarchical data)
All at once with multiple frames (one per stage)
Stage by stage using stages = parameter in execute()

See execute() for details on execution patterns.

Examples

# Two-stage design: schools then students
sampling_design() |>
  stage(label = "Schools") |>
    cluster_by(school_id) |>
    draw(n = 50, method = "pps_brewer", mos = enrollment) |>
  stage(label = "Students") |>
    draw(n = 20) |>
  execute(tanzania_schools, seed = 123)
#> == tbl_sample ==
#> Weights: 12.16 - 191.35 (mean: 46.56 )
#> 
#> # A tibble: 50 × 17
#>    school_id  region       district school_level ownership enrollment n_teachers
#>  * <chr>      <fct>        <fct>    <fct>        <fct>          <dbl>      <dbl>
#>  1 TZ_01_0150 Dar es Sala… Ilala    Secondary    Governme…        493         14
#>  2 TZ_02_0006 Dar es Sala… Kinondo… Primary      Governme…        728         17
#>  3 TZ_02_0010 Dar es Sala… Kinondo… Primary      Governme…        433         11
#>  4 TZ_02_0048 Dar es Sala… Kinondo… Primary      Governme…        708         17
#>  5 TZ_02_0093 Dar es Sala… Kinondo… Primary      Governme…        362          7
#>  6 TZ_03_0025 Dar es Sala… Temeke   Primary      Governme…        645         14
#>  7 TZ_03_0034 Dar es Sala… Temeke   Secondary    Governme…        351          8
#>  8 TZ_03_0099 Dar es Sala… Temeke   Primary      Governme…        641         14
#>  9 TZ_03_0125 Dar es Sala… Temeke   Primary      Governme…        809         22
#> 10 TZ_04_0058 Arusha       Arusha … Primary      Private          437         10
#> # ℹ 40 more rows
#> # ℹ 10 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Two-stage with stratification at stage 1
sampling_design() |>
  stage(label = "Schools") |>
    stratify_by(school_level) |>
    cluster_by(school_id) |>
    draw(n = 20, method = "pps_brewer", mos = enrollment) |>
  stage(label = "Students") |>
    draw(n = 15) |>
  execute(tanzania_schools, seed = 1234)
#> == tbl_sample ==
#> Weights: 12.73 - 206.66 (mean: 52.54 )
#> 
#> # A tibble: 40 × 17
#>    school_id  region       district school_level ownership enrollment n_teachers
#>  * <chr>      <fct>        <fct>    <fct>        <fct>          <dbl>      <dbl>
#>  1 TZ_01_0068 Dar es Sala… Ilala    Primary      Governme…        539         11
#>  2 TZ_01_0093 Dar es Sala… Ilala    Secondary    Private          545         11
#>  3 TZ_01_0124 Dar es Sala… Ilala    Secondary    Private          351          7
#>  4 TZ_01_0173 Dar es Sala… Ilala    Primary      Private          642         16
#>  5 TZ_02_0001 Dar es Sala… Kinondo… Secondary    Governme…        328          9
#>  6 TZ_02_0003 Dar es Sala… Kinondo… Primary      Governme…        323          8
#>  7 TZ_02_0037 Dar es Sala… Kinondo… Secondary    Governme…        517         14
#>  8 TZ_02_0084 Dar es Sala… Kinondo… Primary      Private          195          4
#>  9 TZ_02_0126 Dar es Sala… Kinondo… Primary      Governme…        857         19
#> 10 TZ_03_0001 Dar es Sala… Temeke   Primary      Governme…        241          5
#> # ℹ 30 more rows
#> # ℹ 10 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# DHS-style two-stage stratified cluster sample
sampling_design(title = "DHS-style Household Survey") |>
  stage(label = "Enumeration Areas") |>
    stratify_by(region, strata) |>
    cluster_by(ea_id) |>
    draw(n = 3, method = "pps_brewer", mos = hh_count) |>
  stage(label = "Households") |>
    draw(n = 20) |>
  execute(niger_eas, seed = 2026)
#> == tbl_sample: DHS-style Household Survey ==
#> Weights: 1.79 - 177.18 (mean: 33.11 )
#> 
#> # A tibble: 48 × 14
#>    ea_id       region department strata hh_count pop_estimate .weight .sample_id
#>  * <chr>       <fct>  <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Aga_01_0001 Agadez Agadez     Rural        59          413   13.4           1
#>  2 Aga_01_0010 Agadez Agadez     Urban       192         1344    7.18          2
#>  3 Aga_01_0011 Agadez Agadez     Rural        54          378   14.7           3
#>  4 Aga_02_0012 Agadez Arlit      Rural       137          959    5.79          4
#>  5 Aga_03_0010 Agadez Bilma      Urban       259         1554    5.33          5
#>  6 Aga_04_0004 Agadez Tchirozér… Urban       215         1290    6.42          6
#>  7 Dif_06_0001 Diffa  Mainé-Sor… Rural       102          714   15.4           7
#>  8 Dif_06_0011 Diffa  Mainé-Sor… Urban       134          938    2.47          8
#>  9 Dif_06_0012 Diffa  Mainé-Sor… Rural        72          504   21.8           9
#> 10 Dif_06_0013 Diffa  Mainé-Sor… Urban       139          834    2.38         10
#> # ℹ 38 more rows
#> # ℹ 6 more variables: .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Partial execution: select only stage 1
design <- sampling_design() |>
  stage(label = "EAs") |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 10, method = "pps_brewer", mos = hh_count) |>
  stage(label = "Households") |>
    draw(n = 12)

# Execute stage 1 only
selected_eas <- execute(design, niger_eas, stages = 1, seed = 1)
nrow(selected_eas)
#> [1] 80