An enumeration area (EA) frame for two-stage cluster surveys, built from WorldPop/GRID3 preEA boundaries, GHS-DUC urban classification, and Zimbabwe 2022 Census population figures. The frame covers 10 provinces and 91 districts.
Format
A tibble with 22,600 rows and 7 columns:
- ea_id
Character. Unique enumeration area identifier
- province
Factor. Province name (10 provinces)
- district
Factor. District name within province (91 districts)
- urban_rural
Factor. Urban/Rural classification
- households
Integer. Number of households in the EA (measure of size for PPS)
- population
Integer. EA population
- area_km2
Numeric. EA area in square kilometres
Details
This dataset is designed for demonstrating:
Two-stage cluster sampling (districts then EAs, or EAs then households)
PPS sampling using household counts
Stratification by province and urban/rural
Partial execution (operational multi-stage sampling)
Two-phase sampling (with zwe_households)
The data structure mirrors typical DHS/MICS sampling frames where EAs are nested within districts and provinces.
See also
zwe_households for household-level data within a subset of EAs
Examples
# Explore the data
head(zwe_eas)
#> # A tibble: 6 × 7
#> ea_id province district urban_rural population households area_km2
#> <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00001 Bulawayo Bulawayo Urban 1029 282 0.34
#> 2 EA_00002 Bulawayo Bulawayo Urban 1384 388 0.59
#> 3 EA_00003 Bulawayo Bulawayo Urban 1058 304 0.34
#> 4 EA_00004 Bulawayo Bulawayo Urban 1328 383 0.36
#> 5 EA_00005 Bulawayo Bulawayo Urban 1229 335 0.3
#> 6 EA_00006 Bulawayo Bulawayo Urban 1141 309 0.36
table(zwe_eas$province)
#>
#> Bulawayo Harare Manicaland Mashonaland Central
#> 453 1479 3266 2170
#> Mashonaland East Mashonaland West Masvingo Matabeleland North
#> 2821 2851 3033 1796
#> Matabeleland South Midlands
#> 1606 3125
table(zwe_eas$urban_rural)
#>
#> Rural Urban
#> 16380 6220
# Two-stage cluster sample: districts then EAs
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sampling_design() |>
add_stage(label = "Districts") |>
stratify_by(province) |>
cluster_by(district) |>
draw(n = 2, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 5) |>
execute(zwe_frame, seed = 42)
#> Warning: Sample size capped to population in 1 stratum/strata: "Bulawayo".
#> ℹ Requested total: 20. Actual total: 19.
#> # A tbl_sample: 95 × 16
#> # Weights: 243.13 [90.6, 411.87]
#> ea_id province district urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00020 Bulawayo Bulawayo Urban 1157 362 0.81
#> 2 EA_00410 Bulawayo Bulawayo Urban 1222 339 0.4
#> 3 EA_00370 Bulawayo Bulawayo Urban 1232 363 0.19
#> 4 EA_00367 Bulawayo Bulawayo Urban 1421 403 0.27
#> 5 EA_00387 Bulawayo Bulawayo Urban 1293 397 0.8
#> 6 EA_01036 Harare Harare Urban 1341 393 0.92
#> 7 EA_01340 Harare Harare Urban 1318 357 0.83
#> 8 EA_01022 Harare Harare Rural 551 124 2.42
#> 9 EA_01671 Harare Harare Urban 1478 413 0.27
#> 10 EA_01360 Harare Harare Urban 2411 684 0.68
#> # ℹ 85 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>