Skip to contents

sampling_design() is the entry point for creating survey sampling specifications. It creates an empty design object that can be built up using pipe-able verbs like stratify_by(), cluster_by(), draw(), and add_stage().

Usage

sampling_design(title = NULL)

Arguments

title

Optional character string providing a title for the design. Useful for documentation and printing purposes.

Value

A sampling_design object that can be piped to other design functions.

Details

The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:

  • Reused across different data frames

  • Partially executed (e.g., stage by stage)

  • Inspected and validated before execution

  • Documented and shared

The design specification is frame-independent: it describes how to sample, not what to sample from.

Design Flow

A typical design workflow follows this pattern:


sampling_design() |>
  stratify_by(...) |>
  cluster_by(...) |>
  draw(...) |>
  execute(frame)

See also

stratify_by() for defining strata, cluster_by() for defining clusters, draw() for specifying selection parameters, add_stage() for multi-stage designs, execute() for running designs

Examples

# Simple random sample of 100 EAs
sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights:      149.34 [149.34, 149.34]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_10182 Boucle … Mouhoun  Ouarko… Rural             1347        185    33.0 
#>  2 EA_14571 Centre-… Nahouri  Zecco   Urban             2829        452     8.91
#>  3 EA_03356 Centre-… Kourite… Dialga… Rural             1010        150    19.0 
#>  4 EA_01856 Hauts-B… Houet    Bobo-D… Urban             1938        311     0.32
#>  5 EA_14703 Plateau… Oubrite… Ziniare Rural             1320        188     3   
#>  6 EA_10602 Est      Tapoa    Partia… Rural             1393        191    17.2 
#>  7 EA_08087 Plateau… Ganzour… Mogtedo Rural             2284        285     1.99
#>  8 EA_05693 Hauts-B… Houet    Karang… Rural             1295        181    19.0 
#>  9 EA_01975 Est      Gnagna   Bogande Rural             2018        276    22.2 
#> 10 EA_04482 Centre-… Boulgou  Garango Rural             1211        180    18.3 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified sample with proportional allocation
sampling_design(title = "Burkina Faso EA Survey") |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 400) |>
  execute(bfa_eas, seed = 2)
#> # A tbl_sample: 400 × 17 | Burkina Faso EA Survey
#> # Weights:      37.34 [36.18, 38]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_06470 Boucle … Mouhoun  Kona    Rural             1083        150    50.1 
#>  2 EA_08656 Boucle … Kossi    Nouna   Rural             1197        177    45.9 
#>  3 EA_08720 Boucle … Kossi    Nouna   Rural             1643        242    25.2 
#>  4 EA_12444 Boucle … Banwa    Solenzo Rural             1316        177    15.5 
#>  5 EA_12420 Boucle … Banwa    Solenzo Rural               79         11     5.68
#>  6 EA_06887 Boucle … Banwa    Kouka   Rural             1748        241    18.1 
#>  7 EA_06014 Boucle … Sourou   Kiemba… Rural              518         67    18.3 
#>  8 EA_14033 Boucle … Nayala   Yaba    Rural             1115        147    11.8 
#>  9 EA_07232 Boucle … Sourou   Lankoue Rural             1031        136    18.5 
#> 10 EA_04730 Boucle … Sourou   Gomboro Rural             1597        225    43.8 
#> # ℹ 390 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Two-stage cluster sample of districts and EAs
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sampling_design(title = "Zimbabwe DHS") |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 3)
#> # A tbl_sample: 200 × 16 | Zimbabwe DHS
#> # Weights:      117.98 [51.23, 156.43]
#>    ea_id    province district urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>            <int>      <int>    <dbl>
#>  1 EA_00413 Bulawayo Bulawayo Urban             1368        378     0.21
#>  2 EA_00108 Bulawayo Bulawayo Urban             1083        295     1.44
#>  3 EA_00261 Bulawayo Bulawayo Urban             1185        342     0.3 
#>  4 EA_00165 Bulawayo Bulawayo Rural              614        136    18.3 
#>  5 EA_00137 Bulawayo Bulawayo Urban              917        276     0.52
#>  6 EA_00393 Bulawayo Bulawayo Urban             1307        360     0.37
#>  7 EA_00256 Bulawayo Bulawayo Urban             1299        398     0.24
#>  8 EA_00376 Bulawayo Bulawayo Urban             1187        341     0.36
#>  9 EA_00374 Bulawayo Bulawayo Urban             1352        389     0.38
#> 10 EA_00274 Bulawayo Bulawayo Urban             1264        374     0.24
#> # ℹ 190 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>