A synthetic enumeration area (EA) frame for household surveys, inspired by Demographic and Health Survey (DHS) sampling designs. Uses real Niger administrative divisions but contains entirely fictional data.

niger_eas

Format

A tibble with approximately 1,500 rows and 6 columns:

ea_id

Character. Unique enumeration area identifier

region

Factor. Region name (8 regions: Agadez, Diffa, Dosso, Maradi, Niamey, Tahoua, Tillabéri, Zinder)

department

Factor. Department name within region

strata

Factor. Urban/Rural stratification

hh_count

Integer. Number of households in the EA (measure of size for PPS)

pop_estimate

Integer. Estimated population

Details

This dataset is designed for demonstrating:

  • Stratified multi-stage cluster sampling

  • PPS (probability proportional to size) sampling using household counts

  • Urban/rural stratification

  • Two-stage designs (EAs then households)

The data structure mirrors typical DHS sampling frames where enumeration areas are the primary sampling units, selected with probability proportional to the number of households.

Note

This is a synthetic dataset created for demonstration purposes. While it uses real Niger administrative divisions, all data values are fictional.

See also

niger_eas_variance for Neyman allocation, niger_eas_cost for optimal allocation

Examples

# Explore the data
head(niger_eas)
#> # A tibble: 6 × 6
#>   ea_id       region department strata hh_count pop_estimate
#>   <chr>       <fct>  <fct>      <fct>     <dbl>        <dbl>
#> 1 Aga_01_0001 Agadez Agadez     Rural        59          413
#> 2 Aga_01_0002 Agadez Agadez     Urban       157          942
#> 3 Aga_01_0003 Agadez Agadez     Urban       124          868
#> 4 Aga_01_0004 Agadez Agadez     Rural       146         1022
#> 5 Aga_01_0005 Agadez Agadez     Urban       112          896
#> 6 Aga_01_0006 Agadez Agadez     Rural       182         1092
table(niger_eas$region)
#> 
#>    Agadez     Diffa     Dosso    Maradi    Niamey    Tahoua Tillabéri    Zinder 
#>        51        65       178       287       156       275       219       305 
table(niger_eas$strata)
#> 
#> Urban Rural 
#>   331  1205 

# DHS-style two-stage stratified cluster sample
sampling_design() |>
  stage(label = "EAs") |>
    stratify_by(region, strata) |>
    cluster_by(ea_id) |>
    draw(n = 3, method = "pps_brewer", mos = hh_count) |>
  stage(label = "Households") |>
    draw(n = 20) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1.35 - 149.24 (mean: 30.14 )
#> 
#> # A tibble: 48 × 14
#>    ea_id       region department strata hh_count pop_estimate .weight .sample_id
#>  * <chr>       <fct>  <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Aga_02_0009 Agadez Arlit      Urban       137          959   10.1           1
#>  2 Aga_02_0011 Agadez Arlit      Rural        70          350   11.3           2
#>  3 Aga_03_0006 Agadez Bilma      Rural        98          490    8.09          3
#>  4 Aga_04_0007 Agadez Tchirozér… Urban       121          847   11.4           4
#>  5 Aga_04_0008 Agadez Tchirozér… Rural        76          456   10.4           5
#>  6 Aga_04_0012 Agadez Tchirozér… Urban       431         2586    3.20          6
#>  7 Dif_05_0012 Diffa  Diffa      Rural       216         1512    7.27          7
#>  8 Dif_06_0010 Diffa  Mainé-Sor… Urban       119          714    2.78          8
#>  9 Dif_06_0014 Diffa  Mainé-Sor… Rural        67          335   23.4           9
#> 10 Dif_06_0015 Diffa  Mainé-Sor… Rural        93          558   16.9          10
#> # ℹ 38 more rows
#> # ℹ 6 more variables: .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>