Create a Sampling Design

sampling_design() is the entry point for creating survey sampling specifications. It creates an empty design object that can be built up using pipe-able verbs like stratify_by(), cluster_by(), draw(), and add_stage().

Usage

sampling_design(title = NULL)

Arguments

title: Optional character string providing a title for the design. Useful for documentation and printing purposes.

Value

A sampling_design object that can be piped to other design functions.

Details

The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:

Reused across different data frames
Partially executed (e.g., stage by stage)
Inspected and validated before execution
Documented and shared

The design specification is frame-independent: it describes how to sample, not what to sample from.

Design Flow

A typical design workflow follows this pattern:


sampling_design() |>
  stratify_by(...) |>
  cluster_by(...) |>
  draw(...) |>
  execute(frame)

Examples

# Simple random sample of 100 EAs
sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights:      149 [149, 149]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_10155 Boucle … Mouhoun  Ouarko… Rural             1347        187    33.0 
#>  2 EA_01016 Centre-… Zoundwe… Bere    Rural             1166        193    23.2 
#>  3 EA_04918 Centre-… Kourite… Goungu… Rural              949        141     6.89
#>  4 EA_01890 Hauts-B… Houet    Bobo-D… Urban             1195        191     0.25
#>  5 EA_14703 Plateau… Oubrite… Ziniare Rural             1340        177    26.1 
#>  6 EA_12688 Est      Tapoa    Tambaga Rural              994        139     6.46
#>  7 EA_11778 Plateau… Ganzour… Saolgo  Rural              998        124     1.41
#>  8 EA_05700 Hauts-B… Houet    Karang… Rural             1049        119    23.6 
#>  9 EA_06057 Est      Gnagna   Koala   Rural             1291        173    27.6 
#> 10 EA_04482 Centre-… Boulgou  Garango Rural             1553        262     1.44
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified sample with proportional allocation
sampling_design(title = "Burkina Faso EA Survey") |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 400) |>
  execute(bfa_eas, seed = 2)
#> # A tbl_sample: 400 × 17 | Burkina Faso EA Survey
#> # Weights:      37.25 [36.77, 38]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_06443 Boucle … Mouhoun  Kona    Rural             1083        171    50.1 
#>  2 EA_08629 Boucle … Kossi    Nouna   Rural             1197        156    45.9 
#>  3 EA_08693 Boucle … Kossi    Nouna   Rural             1643        214    25.2 
#>  4 EA_12417 Boucle … Banwa    Solenzo Rural             1316        157    15.5 
#>  5 EA_12393 Boucle … Banwa    Solenzo Rural               79          9     5.68
#>  6 EA_06860 Boucle … Banwa    Kouka   Rural             1748        201    18.1 
#>  7 EA_07200 Boucle … Sourou   Lanfie… Rural             1716        250     0.5 
#>  8 EA_14006 Boucle … Nayala   Yaba    Rural             1115        150    11.8 
#>  9 EA_13692 Boucle … Sourou   Toeni   Rural             1172        179     5.9 
#> 10 EA_05775 Boucle … Sourou   Kassoum Rural             1255        170    52.2 
#> # ℹ 390 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Two-stage cluster sample of districts and EAs
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sampling_design(title = "Zimbabwe DHS") |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 3)
#> # A tbl_sample: 200 × 16 | Zimbabwe DHS
#> # Weights:      117.98 [51.23, 156.43]
#>    ea_id    province district urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>            <int>      <int>    <dbl>
#>  1 EA_00413 Bulawayo Bulawayo Urban             1368        378     0.21
#>  2 EA_00108 Bulawayo Bulawayo Urban             1083        295     1.44
#>  3 EA_00261 Bulawayo Bulawayo Urban             1185        342     0.3 
#>  4 EA_00165 Bulawayo Bulawayo Rural              614        136    18.3 
#>  5 EA_00137 Bulawayo Bulawayo Urban              917        276     0.52
#>  6 EA_00393 Bulawayo Bulawayo Urban             1307        360     0.37
#>  7 EA_00256 Bulawayo Bulawayo Urban             1299        398     0.24
#>  8 EA_00376 Bulawayo Bulawayo Urban             1187        341     0.36
#>  9 EA_00374 Bulawayo Bulawayo Urban             1352        389     0.38
#> 10 EA_00274 Bulawayo Bulawayo Urban             1264        374     0.24
#> # ℹ 190 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>