A synthetic household-level dataset nested within niger_eas. Each row represents one household, enabling true two-stage cluster sampling demonstrations where EAs are selected first, then households within selected EAs.

niger_households

Format

A tibble with approximately 150,000 rows and 9 columns:

hh_id

Character. Unique household identifier

ea_id

Character. Parent enumeration area identifier (links to niger_eas)

region

Factor. Region name

department

Factor. Department name

strata

Factor. Urban/Rural stratification

hh_size

Integer. Number of persons in household

head_age

Integer. Age of household head (18-85)

head_sex

Factor. Sex of household head (Male/Female)

n_children_u5

Integer. Number of children under 5 years

Details

This dataset is designed for demonstrating:

  • True two-stage cluster sampling (EAs then households)

  • Joining cluster-level and unit-level data

  • Within-cluster subsampling

The number of households per EA matches the hh_count variable in niger_eas, ensuring consistency between the EA frame and household listing.

Note

This is a synthetic dataset. All values are fictional.

See also

niger_eas for the EA-level frame

Examples

# Explore the data
head(niger_households)
#> # A tibble: 6 × 9
#>   hh_id   ea_id region department strata hh_size head_age head_sex n_children_u5
#>   <chr>   <chr> <fct>  <fct>      <fct>    <dbl>    <dbl> <fct>            <int>
#> 1 Aga_01… Aga_… Agadez Agadez     Rural        1       72 Female               1
#> 2 Aga_01… Aga_… Agadez Agadez     Rural        3       53 Male                 2
#> 3 Aga_01… Aga_… Agadez Agadez     Rural        9       39 Male                 1
#> 4 Aga_01… Aga_… Agadez Agadez     Rural        8       43 Male                 0
#> 5 Aga_01… Aga_… Agadez Agadez     Rural       11       28 Male                 0
#> 6 Aga_01… Aga_… Agadez Agadez     Rural        8       43 Male                 0
length(unique(niger_households$ea_id))
#> [1] 1536

# True two-stage sample: select EAs, then households within selected EAs
sampling_design() |>
  stage(label = "EAs") |>
    stratify_by(strata) |>
    cluster_by(ea_id) |>
    draw(n = 5) |>
  stage(label = "Households") |>
    draw(n = 10) |>
  execute(niger_households, seed = 42)
#> == tbl_sample ==
#> Weights: 522.98 - 2578.7 (mean: 1277.4 )
#> 
#> # A tibble: 100 × 16
#>    hh_id  ea_id region department strata hh_size head_age head_sex n_children_u5
#>  * <chr>  <chr> <fct>  <fct>      <fct>    <dbl>    <dbl> <fct>            <int>
#>  1 Aga_0… Aga_… Agadez Tchirozér… Rural        6       48 Male                 1
#>  2 Aga_0… Aga_… Agadez Tchirozér… Rural        4       31 Female               1
#>  3 Aga_0… Aga_… Agadez Tchirozér… Rural        4       22 Female               0
#>  4 Aga_0… Aga_… Agadez Tchirozér… Rural        3       43 Male                 0
#>  5 Aga_0… Aga_… Agadez Tchirozér… Rural        8       38 Male                 0
#>  6 Aga_0… Aga_… Agadez Tchirozér… Rural        7       18 Female               1
#>  7 Aga_0… Aga_… Agadez Tchirozér… Rural        4       51 Male                 0
#>  8 Aga_0… Aga_… Agadez Tchirozér… Rural        3       21 Male                 0
#>  9 Aga_0… Aga_… Agadez Tchirozér… Rural        6       44 Male                 0
#> 10 Aga_0… Aga_… Agadez Tchirozér… Rural        6       59 Male                 0
#> # ℹ 90 more rows
#> # ℹ 7 more variables: .weight <dbl>, .sample_id <int>, .stage <int>,
#> #   .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>

# For PPS selection of EAs, use the EA-level frame which has
# cluster-level variables (hh_count) suitable as measure of size:
# selected_eas <- sampling_design() |>
#   stage(label = "EAs") |>
#     stratify_by(strata) |>
#     cluster_by(ea_id) |>
#     draw(n = 5, method = "pps_brewer", mos = hh_count) |>
#   stage(label = "Households") |>
#     draw(n = 10) |>
#   execute(niger_eas, stages = 1, seed = 42)
# selected_eas |> execute(niger_households, seed = 43)