An enumeration area (EA) frame for household surveys, built from WorldPop/GRID3 preEA boundaries (CC-BY 4.0), EHCVM 2021 household survey parameters, Cadre Harmonise food security analysis, and HDX COD-AB administrative boundaries. Each row corresponds to one preEA polygon from the WorldPop shapefile, providing a 1:1 mapping for spatial joins. The frame covers 13 regions, 45 provinces, and 348 communes of Burkina Faso.
Format
A tibble with 44,570 rows and 12 columns:
- ea_id
Integer. Unique enumeration area identifier (matches preEA_EAID in WorldPop shapefile)
- region
Factor. Region name (13 regions)
- province
Factor. Province name within region (45 provinces)
- commune
Factor. Commune name within province (348 communes)
- urban_rural
Factor. Urban/Rural classification based on commune density
- population
Integer. EA population from RGPH 2019
- households
Integer. Number of households, derived from EHCVM 2021 household size parameters
- area_km2
Numeric. EA area in square kilometres
- accessible
Logical. Whether the EA is in an accessible zone (conflict-affected regions have lower accessibility)
- dist_road_km
Numeric. Distance to paved road in km (synthetic, calibrated by milieu)
- food_insecurity_pct
Numeric. Cadre Harmonise Phase 3+ prevalence, calibrated from Jan-May 2024 province-level analysis
- cost
Numeric. Survey cost per EA in thousands FCFA (driven by accessibility and distance)
Details
This dataset is designed for demonstrating:
Stratified multi-stage cluster sampling
PPS (probability proportional to size) sampling using household counts
Urban/rural stratification
Neyman and optimal allocation using auxiliary variables
Sampling in conflict-affected contexts with accessibility constraints
The data structure follows typical household survey sampling frames where enumeration areas serve as primary sampling units, selected with probability proportional to the number of households.
See also
bfa_eas_variance for Neyman allocation, bfa_eas_cost for optimal allocation
Examples
# Explore the data
head(bfa_eas)
#> # A tibble: 6 × 12
#> ea_id region province commune urban_rural population households area_km2
#> <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 11759 Boucle du M… Bale Bagassi Rural 56 7 9.21
#> 2 11760 Boucle du M… Bale Bagassi Rural 204 25 8.75
#> 3 11761 Boucle du M… Bale Bagassi Rural 63 8 8.54
#> 4 11762 Boucle du M… Bale Bagassi Rural 257 31 8.92
#> 5 11763 Boucle du M… Bale Bagassi Rural 48 6 4.89
#> 6 11764 Boucle du M… Bale Bagassi Rural 139 17 8.51
#> # ℹ 4 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>
table(bfa_eas$region)
#>
#> Boucle du Mouhoun Cascades Centre Centre-Est
#> 5009 2508 3888 2941
#> Centre-Nord Centre-Ouest Centre-Sud Est
#> 3402 3723 1612 5505
#> Hauts-Bassins Nord Plateau-Central Sahel
#> 4839 2930 1662 4144
#> Sud-Ouest
#> 2407
table(bfa_eas$urban_rural)
#>
#> Rural Urban
#> 37687 6883
# Stratified PPS sample
sampling_design() |>
stratify_by(region, urban_rural) |>
draw(n = 3, method = "pps_brewer", mos = households) |>
execute(bfa_eas, seed = 3)
#> # A tbl_sample: 69 × 18
#> # Weights: 454.54 [1.54, 1814.79]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 6279 Boucle du … Banwa Kouka Rural 672 93 0.32
#> 2 11606 Boucle du … Nayala Yaba Rural 982 130 1.68
#> 3 12784 Boucle du … Kossi Djibas… Rural 518 66 8.67
#> 4 9044 Boucle du … Bale Poura Urban 751 121 0.77
#> 5 9046 Boucle du … Bale Poura Urban 4132 664 3.25
#> 6 9045 Boucle du … Bale Poura Urban 536 86 0.17
#> 7 20686 Cascades Comoe Banfora Rural 600 73 0.32
#> 8 38158 Cascades Comoe Mousso… Rural 877 145 0.89
#> 9 10248 Cascades Comoe Sidera… Rural 2194 350 1.76
#> 10 13950 Centre Kadiogo Koubri Rural 1632 321 2.54
#> # ℹ 59 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>