Zimbabwe Enumeration Areas for Demographic, Health, and Child-Indicator Surveys

A derived enumeration area (EA) frame for two-stage demographic, health, and child-indicator household surveys. Each row corresponds to one modeled WorldPop/GRID3 preEA polygon, and ea_id preserves the source identifier for spatial joins. Population and household counts are disaggregated from Zimbabwe's 2022 census ward totals. The frame covers 10 provinces and 91 districts.

Usage

zwe_eas

Format

A tibble with 107,250 rows and 12 columns:

ea_id: Integer. Unique preEA identifier from the WorldPop source
province: Factor. Province name (10 provinces)
district: Factor. District name within province (91 districts)
ward_pcode: Character. Ward P-code carried by the source preEA product (e.g. "ZW150104")
urban_rural: Factor. Modeled urban/rural classification based on building density and calibrated to GHS-DUC provincial shares
population: Integer. EA population, calibrated to 2022 Census ward totals
households: Integer. Number of households, calibrated to 2022 Census ward totals
buildings: Integer. Modeled building count from the WorldPop preEA product
women_15_49: Integer. Estimated women aged 15-49, from WorldPop age-sex grids scaled to census population
men_15_49: Integer. Estimated men aged 15-49, from WorldPop age-sex grids scaled to census population
children_under5: Integer. Estimated children under 5, from WorldPop age-sex grids scaled to census population
area_km2: Numeric. EA area in square kilometers

Source

Qader, Kuepie, and Tatem (2024), Automatic national census pre-Enumeration Areas for Zimbabwe in 2021, version 1.0, WorldPop, University of Southampton, doi:10.5258/SOTON/WP00797 . Data licensed CC BY 4.0.
European Commission Joint Research Centre, GHS-WUP-DUC R2025A, https://human-settlement.emergency.copernicus.eu/GHSWUPDownload.php?ds=WUPDUC.
Zimbabwe National Statistics Agency, 2022 Population Distribution by District and Ward, https://zimgeoportal.org.zw/datasets/2022-population-distribution-by-district-and-ward/.
WorldPop 2022 constrained 100 m age-sex grids, https://www.worldpop.org/.

Details

This dataset is designed for demonstrating:

Two-stage cluster sampling (EAs then households)
PPS sampling using household or population counts
Stratification by province and urban/rural
Partial execution (operational multi-stage sampling)
Creating household listings from selected EAs for second-stage sampling

The data structure follows typical two-stage cluster survey frames where EAs are nested within districts and provinces. To create a household listing for second-stage sampling after selecting EAs, expand each selected EA into individual household rows. The preEAs are modeled building-delimited areas rather than official census EAs, and the resulting counts should not be used as official small-area statistics.


# After stage 1 selection:
listing <- selected[rep(seq_len(nrow(selected)), selected$households), ]
listing$hh_id <- seq_len(nrow(listing))

Examples

# Explore the data
head(zwe_eas)
#> # A tibble: 6 × 12
#>   ea_id province district ward_pcode urban_rural population households buildings
#>   <int> <fct>    <fct>    <chr>      <fct>            <int>      <int>     <int>
#> 1   213 Bulawayo Bulawayo ZW102103   Urban              123         34        72
#> 2   214 Bulawayo Bulawayo ZW102103   Urban              105         29        62
#> 3   215 Bulawayo Bulawayo ZW102103   Urban              256         70       151
#> 4   216 Bulawayo Bulawayo ZW102103   Rural              193         53       113
#> 5   217 Bulawayo Bulawayo ZW102103   Urban              186         51       109
#> 6   218 Bulawayo Bulawayo ZW102103   Urban              212         58       125
#> # ℹ 4 more variables: women_15_49 <int>, men_15_49 <int>,
#> #   children_under5 <int>, area_km2 <dbl>
table(zwe_eas$province)
#> 
#>            Bulawayo              Harare          Manicaland Mashonaland Central 
#>                2164                6066               15287                9140 
#>    Mashonaland East    Mashonaland West            Masvingo  Matabeleland North 
#>               13632               14109               15312                8712 
#>  Matabeleland South            Midlands 
#>                8003               14825 
table(zwe_eas$urban_rural)
#> 
#> Rural Urban 
#> 83091 24159 

# Two-stage cluster sample: EAs then households
design <- sampling_design() |>
  add_stage(label = "EAs") |>
    stratify_by(province, urban_rural) |>
    cluster_by(ea_id) |>
    draw(n = 3, method = "pps_systematic", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 20)

selected <- execute(design, zwe_eas, stages = 1, seed = 123)

# listing after fieldwork
library(dplyr)
listing <- selected |>
  slice(rep(seq_len(n()), households)) |>
  mutate(hh_id = row_number())

# final sample
smpl <- execute(design, listing, seed = 1234)
#> Warning: The frame sample was modified after execution (rows changed).
#> ℹ Its weights and design metadata are used as-is for the new selection.
#> ℹ Passing a partial result as a frame starts a new sampling phase and restarts
#>   the design at stage 1; it does not continue with only the remaining stages.
#> ℹ For operational multistage sampling, continue from the unmodified partial
#>   sample and pass the listing as its frame: `partial_sample |>
#>   execute(listing_frame)`.
#> ℹ If rows were removed to define a subpopulation, prefer restricting the frame
#>   before executing.
smpl
#> # A tbl_sample: 1164 × 21
#> # Sampling:     2 stages | 1,164/8,498 units
#> # Weights:      3277.26 [69.43, 10807.85]
#>    ea_id province district ward_pcode urban_rural population households
#>  * <int> <fct>    <fct>    <chr>      <fct>            <int>      <int>
#>  1  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  2  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  3  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  4  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  5  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  6  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  7  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  8  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  9  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#> 10  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#> # ℹ 1,154 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> #   children_under5 <int>, area_km2 <dbl>, hh_id <int>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <dbl>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>