Skip to contents

An enumeration area (EA) frame for two-stage cluster surveys, built from WorldPop/GRID3 preEA boundaries (CC-BY 4.0), GHS-DUC urban classification, WorldPop 2022 constrained 100m age-sex grids, and Zimbabwe 2022 Population Census ward-level tallies. Each row corresponds to one preEA polygon from the WorldPop shapefile, providing a 1:1 mapping for spatial joins. The frame covers 10 provinces and 91 districts.

Usage

zwe_eas

Format

A tibble with 107,250 rows and 12 columns:

ea_id

Integer. Unique enumeration area identifier (matches preEA_EAID in WorldPop shapefile)

province

Factor. Province name (10 provinces)

district

Factor. District name within province (91 districts)

ward_pcode

Character. Ward P-code from OCHA COD-AB (e.g. "ZW150104")

urban_rural

Factor. Urban/Rural classification based on building density, calibrated to GHS-DUC provincial shares

population

Integer. EA population, calibrated to 2022 Census ward totals

households

Integer. Number of households, calibrated to 2022 Census ward totals

buildings

Integer. Building count from GRID3 building footprints

women_15_49

Integer. Estimated women aged 15-49, from WorldPop age-sex grids scaled to census population

men_15_49

Integer. Estimated men aged 15-49, from WorldPop age-sex grids scaled to census population

children_under5

Integer. Estimated children under 5, from WorldPop age-sex grids scaled to census population

area_km2

Numeric. EA area in square kilometres

Details

This dataset is designed for demonstrating:

  • Two-stage cluster sampling (EAs then households)

  • PPS sampling using household or population counts

  • Stratification by province and urban/rural

  • Partial execution (operational multi-stage sampling)

  • Creating household listings from selected EAs for second-stage sampling

The data structure follows typical two-stage cluster survey frames where EAs are nested within districts and provinces. To create a household listing for second-stage sampling after selecting EAs, expand each selected EA into individual household rows:


# After stage 1 selection:
listing <- selected[rep(seq_len(nrow(selected)), selected$households), ]
listing$hh_id <- seq_len(nrow(listing))

Examples

# Explore the data
head(zwe_eas)
#> # A tibble: 6 × 12
#>   ea_id province district ward_pcode urban_rural population households buildings
#>   <int> <fct>    <fct>    <chr>      <fct>            <int>      <int>     <int>
#> 1   213 Bulawayo Bulawayo ZW102103   Urban              123         34        72
#> 2   214 Bulawayo Bulawayo ZW102103   Urban              105         29        62
#> 3   215 Bulawayo Bulawayo ZW102103   Urban              256         70       151
#> 4   216 Bulawayo Bulawayo ZW102103   Rural              193         53       113
#> 5   217 Bulawayo Bulawayo ZW102103   Urban              186         51       109
#> 6   218 Bulawayo Bulawayo ZW102103   Urban              212         58       125
#> # ℹ 4 more variables: women_15_49 <int>, men_15_49 <int>,
#> #   children_under5 <int>, area_km2 <dbl>
table(zwe_eas$province)
#> 
#>            Bulawayo              Harare          Manicaland Mashonaland Central 
#>                2164                6066               15287                9140 
#>    Mashonaland East    Mashonaland West            Masvingo  Matabeleland North 
#>               13632               14109               15312                8712 
#>  Matabeleland South            Midlands 
#>                8003               14825 
table(zwe_eas$urban_rural)
#> 
#> Rural Urban 
#> 83091 24159 

# Two-stage cluster sample: EAs then households
design <- sampling_design() |>
  add_stage(label = "EAs") |>
    stratify_by(province, urban_rural) |>
    cluster_by(ea_id) |>
    draw(n = 3, method = "pps_systematic", mos = households) |>
  add_stage(label = "Households") |>
    draw(n = 20)

selected <- execute(design, zwe_eas, stages = 1, seed = 123)

# listing after fieldwork
library(dplyr)
listing <- selected |>
  slice(rep(seq_len(n()), households)) |>
  mutate(hh_id = row_number())

# final sample
smpl <- execute(design, listing, seed = 1234)
smpl
#> # A tbl_sample: 1164 × 21
#> # Weights:      3277.26 [69.43, 10807.85]
#>    ea_id province district ward_pcode urban_rural population households
#>  * <int> <fct>    <fct>    <chr>      <fct>            <int>      <int>
#>  1  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  2  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  3  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  4  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  5  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  6  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  7  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  8  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#>  9  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#> 10  1435 Bulawayo Bulawayo ZW102105   Urban              116         34
#> # ℹ 1,154 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> #   children_under5 <int>, area_km2 <dbl>, hh_id <int>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>