A synthetic enumeration area (EA) frame for household surveys, inspired by Demographic and Health Survey (DHS) sampling designs. Uses real Niger administrative divisions but contains entirely fictional data.
niger_easA tibble with approximately 1,500 rows and 6 columns:
Character. Unique enumeration area identifier
Factor. Region name (8 regions: Agadez, Diffa, Dosso, Maradi, Niamey, Tahoua, Tillabéri, Zinder)
Factor. Department name within region
Factor. Urban/Rural stratification
Integer. Number of households in the EA (measure of size for PPS)
Integer. Estimated population
This dataset is designed for demonstrating:
Stratified multi-stage cluster sampling
PPS (probability proportional to size) sampling using household counts
Urban/rural stratification
Two-stage designs (EAs then households)
The data structure mirrors typical DHS sampling frames where enumeration areas are the primary sampling units, selected with probability proportional to the number of households.
This is a synthetic dataset created for demonstration purposes. While it uses real Niger administrative divisions, all data values are fictional.
niger_eas_variance for Neyman allocation, niger_eas_cost for optimal allocation
# Explore the data
head(niger_eas)
#> # A tibble: 6 × 6
#> ea_id region department strata hh_count pop_estimate
#> <chr> <fct> <fct> <fct> <dbl> <dbl>
#> 1 Aga_01_0001 Agadez Agadez Rural 59 413
#> 2 Aga_01_0002 Agadez Agadez Urban 157 942
#> 3 Aga_01_0003 Agadez Agadez Urban 124 868
#> 4 Aga_01_0004 Agadez Agadez Rural 146 1022
#> 5 Aga_01_0005 Agadez Agadez Urban 112 896
#> 6 Aga_01_0006 Agadez Agadez Rural 182 1092
table(niger_eas$region)
#>
#> Agadez Diffa Dosso Maradi Niamey Tahoua Tillabéri Zinder
#> 51 65 178 287 156 275 219 305
table(niger_eas$strata)
#>
#> Urban Rural
#> 331 1205
# DHS-style two-stage stratified cluster sample
sampling_design() |>
stage(label = "EAs") |>
stratify_by(region, strata) |>
cluster_by(ea_id) |>
draw(n = 3, method = "pps_brewer", mos = hh_count) |>
stage(label = "Households") |>
draw(n = 20) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1.35 - 149.24 (mean: 30.14 )
#>
#> # A tibble: 48 × 14
#> ea_id region department strata hh_count pop_estimate .weight .sample_id
#> * <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Aga_02_0009 Agadez Arlit Urban 137 959 10.1 1
#> 2 Aga_02_0011 Agadez Arlit Rural 70 350 11.3 2
#> 3 Aga_03_0006 Agadez Bilma Rural 98 490 8.09 3
#> 4 Aga_04_0007 Agadez Tchirozér… Urban 121 847 11.4 4
#> 5 Aga_04_0008 Agadez Tchirozér… Rural 76 456 10.4 5
#> 6 Aga_04_0012 Agadez Tchirozér… Urban 431 2586 3.20 6
#> 7 Dif_05_0012 Diffa Diffa Rural 216 1512 7.27 7
#> 8 Dif_06_0010 Diffa Mainé-Sor… Urban 119 714 2.78 8
#> 9 Dif_06_0014 Diffa Mainé-Sor… Rural 67 335 23.4 9
#> 10 Dif_06_0015 Diffa Mainé-Sor… Rural 93 558 16.9 10
#> # ℹ 38 more rows
#> # ℹ 6 more variables: .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>