Skip to contents

sampling_design() is the entry point for creating survey sampling specifications. It creates an empty design object that can be built up using pipe-able verbs like stratify_by(), cluster_by(), draw(), and add_stage().

Usage

sampling_design(title = NULL)

Arguments

title

Optional character string providing a title for the design. Useful for documentation and printing purposes.

Value

A sampling_design object that can be piped to other design functions.

Details

The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:

  • Reused across different data frames

  • Partially executed (e.g., stage by stage)

  • Inspected and validated before execution

  • Documented and shared

The design specification is frame-independent: it describes how to sample, not what to sample from.

Design Flow

A typical design workflow follows this pattern:


sampling_design() |>
  stratify_by(...) |>
  cluster_by(...) |>
  draw(...) |>
  execute(frame)

See also

stratify_by() for defining strata, cluster_by() for defining clusters, draw() for specifying selection parameters, add_stage() for multi-stage designs, execute() for running designs

Examples

# Simple random sample of 100 EAs
sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights:      445.7 [445.7, 445.7]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 43475 Est         Gnagna   Piela   Rural              971        114     8.95
#>  2  6592 Sud-Ouest   Noumbiel Kpuere  Rural               88         11     7.89
#>  3 11611 Boucle du … Nayala   Yaba    Rural              111         15     8.97
#>  4 45236 Centre-Est  Boulgou  Beguedo Urban              939        167     0.32
#>  5  3549 Est         Gourma   Fada-N… Rural              263         32     3.4 
#>  6 39095 Hauts-Bass… Kenedou… Sindo   Rural               37          4     6.09
#>  7 21818 Centre-Est  Kourite… Goungu… Rural               21          4     1.07
#>  8 15528 Centre      Kadiogo  Ouagad… Urban             1207        182     0.14
#>  9 14284 Est         Gourma   Matiak… Rural              178         21     8.86
#> 10  3433 Est         Gourma   Fada-N… Rural              285         34     8.55
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified sample with proportional allocation
sampling_design(title = "Burkina Faso EA Survey") |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 400) |>
  execute(bfa_eas, seed = 2)
#> # A tbl_sample: 400 × 17 | Burkina Faso EA Survey
#> # Weights:      111.42 [107.47, 113.12]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1  6219 Boucle du … Nayala   Kougny  Rural             1544        178     2.52
#>  2 34772 Boucle du … Sourou   Tougan  Rural              872        120     0.99
#>  3 25911 Boucle du … Mouhoun  Dedoug… Rural              222         39     8.39
#>  4 31953 Boucle du … Sourou   Kiemba… Rural              602         78     6.91
#>  5 31928 Boucle du … Sourou   Kiemba… Rural              906        118     0.85
#>  6 36703 Boucle du … Bale     Fara    Rural              975        143     1.31
#>  7  8235 Boucle du … Mouhoun  Ouarko… Rural              271         37     8.86
#>  8 43720 Boucle du … Mouhoun  Safane  Rural              101         14     8.14
#>  9 11746 Boucle du … Bale     Yaho    Rural               98         15     8.72
#> 10 44450 Boucle du … Mouhoun  Tcheri… Rural              628        100     7.25
#> # ℹ 390 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Two-stage cluster sample of districts and EAs
zwe_frame <- zwe_eas |>
  dplyr::mutate(district_hh = sum(households), .by = district)

sampling_design(title = "Zimbabwe DHS") |>
  add_stage(label = "Districts") |>
    cluster_by(district) |>
    draw(n = 20, method = "pps_brewer", mos = district_hh) |>
  add_stage(label = "EAs") |>
    draw(n = 10) |>
  execute(zwe_frame, seed = 3)
#> # A tbl_sample: 200 × 21 | Zimbabwe DHS
#> # Weights:      486.81 [152.91, 1126.67]
#>    ea_id province district ward_pcode urban_rural population households
#>  * <int> <fct>    <fct>    <chr>      <fct>            <int>      <int>
#>  1 48190 Bulawayo Bulawayo ZW102106   Rural               86         24
#>  2 23741 Bulawayo Bulawayo ZW102118   Urban              383        107
#>  3  1302 Bulawayo Bulawayo ZW102105   Urban              261         78
#>  4 48265 Bulawayo Bulawayo ZW102106   Urban              129         36
#>  5  1415 Bulawayo Bulawayo ZW102105   Urban               97         29
#>  6  1315 Bulawayo Bulawayo ZW102105   Urban              158         47
#>  7 23772 Bulawayo Bulawayo ZW102118   Urban              346         97
#>  8  2268 Bulawayo Bulawayo ZW102104   Urban              116         35
#>  9 47300 Bulawayo Bulawayo ZW102107   Urban              253         66
#> 10 22855 Bulawayo Bulawayo ZW102109   Urban              946        254
#> # ℹ 190 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> #   children_under5 <int>, area_km2 <dbl>, district_hh <int>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>