An enumeration area (EA) frame for two-stage cluster surveys, built from WorldPop/GRID3 preEA boundaries (CC-BY 4.0), GHS-DUC urban classification, WorldPop 2022 constrained 100m age-sex grids, and Zimbabwe 2022 Population Census ward-level tallies. Each row corresponds to one preEA polygon from the WorldPop shapefile, providing a 1:1 mapping for spatial joins. The frame covers 10 provinces and 91 districts.
Format
A tibble with 107,250 rows and 12 columns:
- ea_id
Integer. Unique enumeration area identifier (matches preEA_EAID in WorldPop shapefile)
- province
Factor. Province name (10 provinces)
- district
Factor. District name within province (91 districts)
- ward_pcode
Character. Ward P-code from OCHA COD-AB (e.g. "ZW150104")
- urban_rural
Factor. Urban/Rural classification based on building density, calibrated to GHS-DUC provincial shares
- population
Integer. EA population, calibrated to 2022 Census ward totals
- households
Integer. Number of households, calibrated to 2022 Census ward totals
- buildings
Integer. Building count from GRID3 building footprints
- women_15_49
Integer. Estimated women aged 15-49, from WorldPop age-sex grids scaled to census population
- men_15_49
Integer. Estimated men aged 15-49, from WorldPop age-sex grids scaled to census population
- children_under5
Integer. Estimated children under 5, from WorldPop age-sex grids scaled to census population
- area_km2
Numeric. EA area in square kilometres
Details
This dataset is designed for demonstrating:
Two-stage cluster sampling (EAs then households)
PPS sampling using household or population counts
Stratification by province and urban/rural
Partial execution (operational multi-stage sampling)
Creating household listings from selected EAs for second-stage sampling
The data structure follows typical two-stage cluster survey frames where EAs are nested within districts and provinces. To create a household listing for second-stage sampling after selecting EAs, expand each selected EA into individual household rows:
Examples
# Explore the data
head(zwe_eas)
#> # A tibble: 6 × 12
#> ea_id province district ward_pcode urban_rural population households buildings
#> <int> <fct> <fct> <chr> <fct> <int> <int> <int>
#> 1 213 Bulawayo Bulawayo ZW102103 Urban 123 34 72
#> 2 214 Bulawayo Bulawayo ZW102103 Urban 105 29 62
#> 3 215 Bulawayo Bulawayo ZW102103 Urban 256 70 151
#> 4 216 Bulawayo Bulawayo ZW102103 Rural 193 53 113
#> 5 217 Bulawayo Bulawayo ZW102103 Urban 186 51 109
#> 6 218 Bulawayo Bulawayo ZW102103 Urban 212 58 125
#> # ℹ 4 more variables: women_15_49 <int>, men_15_49 <int>,
#> # children_under5 <int>, area_km2 <dbl>
table(zwe_eas$province)
#>
#> Bulawayo Harare Manicaland Mashonaland Central
#> 2164 6066 15287 9140
#> Mashonaland East Mashonaland West Masvingo Matabeleland North
#> 13632 14109 15312 8712
#> Matabeleland South Midlands
#> 8003 14825
table(zwe_eas$urban_rural)
#>
#> Rural Urban
#> 83091 24159
# Two-stage cluster sample: EAs then households
design <- sampling_design() |>
add_stage(label = "EAs") |>
stratify_by(province, urban_rural) |>
cluster_by(ea_id) |>
draw(n = 3, method = "pps_systematic", mos = households) |>
add_stage(label = "Households") |>
draw(n = 20)
selected <- execute(design, zwe_eas, stages = 1, seed = 123)
# listing after fieldwork
library(dplyr)
listing <- selected |>
slice(rep(seq_len(n()), households)) |>
mutate(hh_id = row_number())
# final sample
smpl <- execute(design, listing, seed = 1234)
smpl
#> # A tbl_sample: 1164 × 21
#> # Weights: 3277.26 [69.43, 10807.85]
#> ea_id province district ward_pcode urban_rural population households
#> * <int> <fct> <fct> <chr> <fct> <int> <int>
#> 1 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 2 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 3 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 4 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 5 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 6 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 7 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 8 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 9 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> 10 1435 Bulawayo Bulawayo ZW102105 Urban 116 34
#> # ℹ 1,154 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> # children_under5 <int>, area_km2 <dbl>, hh_id <int>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>