A synthetic household-level dataset nested within niger_eas. Each row represents one household, enabling true two-stage cluster sampling demonstrations where EAs are selected first, then households within selected EAs.
niger_householdsA tibble with approximately 150,000 rows and 9 columns:
Character. Unique household identifier
Character. Parent enumeration area identifier (links to niger_eas)
Factor. Region name
Factor. Department name
Factor. Urban/Rural stratification
Integer. Number of persons in household
Integer. Age of household head (18-85)
Factor. Sex of household head (Male/Female)
Integer. Number of children under 5 years
This dataset is designed for demonstrating:
True two-stage cluster sampling (EAs then households)
Joining cluster-level and unit-level data
Within-cluster subsampling
The number of households per EA matches the hh_count variable in
niger_eas, ensuring consistency between the EA frame and household
listing.
This is a synthetic dataset. All values are fictional.
niger_eas for the EA-level frame
# Explore the data
head(niger_households)
#> # A tibble: 6 × 9
#> hh_id ea_id region department strata hh_size head_age head_sex n_children_u5
#> <chr> <chr> <fct> <fct> <fct> <dbl> <dbl> <fct> <int>
#> 1 Aga_01… Aga_… Agadez Agadez Rural 1 72 Female 1
#> 2 Aga_01… Aga_… Agadez Agadez Rural 3 53 Male 2
#> 3 Aga_01… Aga_… Agadez Agadez Rural 9 39 Male 1
#> 4 Aga_01… Aga_… Agadez Agadez Rural 8 43 Male 0
#> 5 Aga_01… Aga_… Agadez Agadez Rural 11 28 Male 0
#> 6 Aga_01… Aga_… Agadez Agadez Rural 8 43 Male 0
length(unique(niger_households$ea_id))
#> [1] 1536
# True two-stage sample: select EAs, then households within selected EAs
sampling_design() |>
stage(label = "EAs") |>
stratify_by(strata) |>
cluster_by(ea_id) |>
draw(n = 5) |>
stage(label = "Households") |>
draw(n = 10) |>
execute(niger_households, seed = 42)
#> == tbl_sample ==
#> Weights: 522.98 - 2578.7 (mean: 1277.4 )
#>
#> # A tibble: 100 × 16
#> hh_id ea_id region department strata hh_size head_age head_sex n_children_u5
#> * <chr> <chr> <fct> <fct> <fct> <dbl> <dbl> <fct> <int>
#> 1 Aga_0… Aga_… Agadez Tchirozér… Rural 6 48 Male 1
#> 2 Aga_0… Aga_… Agadez Tchirozér… Rural 4 31 Female 1
#> 3 Aga_0… Aga_… Agadez Tchirozér… Rural 4 22 Female 0
#> 4 Aga_0… Aga_… Agadez Tchirozér… Rural 3 43 Male 0
#> 5 Aga_0… Aga_… Agadez Tchirozér… Rural 8 38 Male 0
#> 6 Aga_0… Aga_… Agadez Tchirozér… Rural 7 18 Female 1
#> 7 Aga_0… Aga_… Agadez Tchirozér… Rural 4 51 Male 0
#> 8 Aga_0… Aga_… Agadez Tchirozér… Rural 3 21 Male 0
#> 9 Aga_0… Aga_… Agadez Tchirozér… Rural 6 44 Male 0
#> 10 Aga_0… Aga_… Agadez Tchirozér… Rural 6 59 Male 0
#> # ℹ 90 more rows
#> # ℹ 7 more variables: .weight <dbl>, .sample_id <int>, .stage <int>,
#> # .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>
# For PPS selection of EAs, use the EA-level frame which has
# cluster-level variables (hh_count) suitable as measure of size:
# selected_eas <- sampling_design() |>
# stage(label = "EAs") |>
# stratify_by(strata) |>
# cluster_by(ea_id) |>
# draw(n = 5, method = "pps_brewer", mos = hh_count) |>
# stage(label = "Households") |>
# draw(n = 10) |>
# execute(niger_eas, stages = 1, seed = 42)
# selected_eas |> execute(niger_households, seed = 43)