A synthetic enumeration area (EA) frame for household surveys, inspired by Demographic and Health Survey (DHS) sampling designs. Uses real Niger administrative divisions but contains entirely fictional data.
niger_easA tibble with approximately 1,500 rows and 6 columns:
Character. Unique enumeration area identifier
Factor. Region name (8 regions: Agadez, Diffa, Dosso, Maradi, Niamey, Tahoua, Tillabéri, Zinder)
Factor. Department name within region
Factor. Urban/Rural stratification
Integer. Number of households in the EA (measure of size for PPS)
Integer. Estimated population
This dataset is designed for demonstrating:
Stratified multi-stage cluster sampling
PPS (probability proportional to size) sampling using household counts
Urban/rural stratification
Two-stage designs (EAs then households)
The data structure mirrors typical DHS sampling frames where enumeration areas are the primary sampling units, selected with probability proportional to the number of households.
This is a synthetic dataset created for demonstration purposes. While it uses real Niger administrative divisions, all data values are fictional.
niger_eas_variance for Neyman allocation, niger_eas_cost for optimal allocation
data(niger_eas)
head(niger_eas)
#> # A tibble: 6 × 6
#> ea_id region department strata hh_count pop_estimate
#> <chr> <fct> <fct> <fct> <dbl> <dbl>
#> 1 Aga_Aga_0001 Agadez Agadez Rural 59 413
#> 2 Aga_Aga_0002 Agadez Agadez Urban 157 942
#> 3 Aga_Aga_0003 Agadez Agadez Urban 124 868
#> 4 Aga_Aga_0004 Agadez Agadez Rural 146 1022
#> 5 Aga_Aga_0005 Agadez Agadez Urban 112 896
#> 6 Aga_Aga_0006 Agadez Agadez Rural 182 1092
# DHS-style two-stage stratified cluster sample
if (FALSE) { # \dontrun{
sampling_design() |>
stage(label = "EAs") |>
stratify_by(region, strata) |>
cluster_by(ea_id) |>
draw(n = 5, method = "pps_brewer", mos = hh_count) |>
stage(label = "Households") |>
draw(n = 25) |>
execute(niger_eas, seed = 42)
} # }