An enumeration area (EA) frame for two-stage cluster surveys, built from WorldPop/GRID3 preEA boundaries, GHS-DUC urban classification, and Zimbabwe 2022 Census population figures. The frame covers 10 provinces and 91 districts.
Format
A tibble with 22,600 rows and 7 columns:
- ea_id
Character. Unique enumeration area identifier
- province
Factor. Province name (10 provinces)
- district
Factor. District name within province (91 districts)
- urban_rural
Factor. Urban/Rural classification
- households
Integer. Number of households in the EA (measure of size for PPS)
- population
Integer. EA population
- area_km2
Numeric. EA area in square kilometres
Details
This dataset is designed for demonstrating:
Two-stage cluster sampling (districts then EAs, or EAs then households)
PPS sampling using household counts
Stratification by province and urban/rural
Partial execution (operational multi-stage sampling)
Two-phase sampling (with zwe_households)
The data structure mirrors typical DHS/MICS sampling frames where EAs are nested within districts and provinces.
See also
zwe_households for household-level data within a subset of EAs
Examples
# Explore the data
head(zwe_eas)
#> # A tibble: 6 × 7
#> ea_id province district urban_rural population households area_km2
#> <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00001 Bulawayo Bulawayo Urban 1029 282 0.34
#> 2 EA_00002 Bulawayo Bulawayo Urban 1384 388 0.59
#> 3 EA_00003 Bulawayo Bulawayo Urban 1058 304 0.34
#> 4 EA_00004 Bulawayo Bulawayo Urban 1328 383 0.36
#> 5 EA_00005 Bulawayo Bulawayo Urban 1229 335 0.3
#> 6 EA_00006 Bulawayo Bulawayo Urban 1141 309 0.36
table(zwe_eas$province)
#>
#> Bulawayo Harare Manicaland Mashonaland Central
#> 453 1479 3266 2170
#> Mashonaland East Mashonaland West Masvingo Matabeleland North
#> 2821 2851 3033 1796
#> Matabeleland South Midlands
#> 1606 3125
table(zwe_eas$urban_rural)
#>
#> Rural Urban
#> 16380 6220
# Two-stage cluster sample: districts then EAs
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sampling_design() |>
add_stage(label = "Districts") |>
stratify_by(province) |>
cluster_by(district) |>
draw(n = 2, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 5) |>
execute(zwe_frame, seed = 123)
#> Warning: Sample size capped to population in 1 stratum/strata: "Bulawayo".
#> ℹ Requested total: 20. Actual total: 19.
#> # A tbl_sample: 95 × 16
#> # Weights: 237.02 [90.6, 326.74]
#> ea_id province district urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00197 Bulawayo Bulawayo Urban 1273 345 0.41
#> 2 EA_00091 Bulawayo Bulawayo Urban 1271 383 1.43
#> 3 EA_00441 Bulawayo Bulawayo Urban 986 289 0.15
#> 4 EA_00348 Bulawayo Bulawayo Urban 1158 324 0.31
#> 5 EA_00137 Bulawayo Bulawayo Urban 917 276 0.52
#> 6 EA_00552 Harare Chitungwiza Urban 379 106 0.05
#> 7 EA_00525 Harare Chitungwiza Urban 1264 367 0.18
#> 8 EA_00479 Harare Chitungwiza Urban 1076 315 0.12
#> 9 EA_00460 Harare Chitungwiza Urban 1567 425 0.17
#> 10 EA_00590 Harare Chitungwiza Urban 1342 360 0.14
#> # ℹ 85 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>