A synthetic school survey frame inspired by education census and survey data. Uses real Tanzania regions and districts but contains entirely fictional data.

tanzania_schools

Format

A tibble with approximately 2,500 rows and 9 columns:

school_id

Character. Unique school identifier

region

Factor. Region name (7 regions)

district

Factor. District name

school_level

Factor. Primary or Secondary

ownership

Factor. Government or Private

enrollment

Integer. Total student enrollment (measure of size)

n_teachers

Integer. Number of teachers

has_electricity

Logical. Whether school has electricity

has_water

Logical. Whether school has water supply

Details

This dataset is designed for demonstrating:

  • Education surveys

  • Two-stage sampling (schools then students)

  • PPS sampling using enrollment

  • Stratification by school level and ownership

The dataset reflects typical East African education system characteristics with more primary than secondary schools, and infrastructure varying by urban/rural location.

Note

This is a synthetic dataset. Regions and districts are real but all data values are fictional.

Examples

data(tanzania_schools)
head(tanzania_schools)
#> # A tibble: 6 × 9
#>   school_id       region district   school_level ownership enrollment n_teachers
#>   <chr>           <fct>  <fct>      <fct>        <fct>          <dbl>      <dbl>
#> 1 TZ_Aru_Aru_0001 Arusha Arusha Ci… Primary      Governme…        619         13
#> 2 TZ_Aru_Aru_0002 Arusha Arusha Ci… Primary      Governme…       1908         40
#> 3 TZ_Aru_Aru_0003 Arusha Arusha Ci… Secondary    Governme…        681         14
#> 4 TZ_Aru_Aru_0004 Arusha Arusha Ci… Primary      Governme…        525         14
#> 5 TZ_Aru_Aru_0005 Arusha Arusha Ci… Secondary    Private          151          3
#> 6 TZ_Aru_Aru_0006 Arusha Arusha Ci… Primary      Governme…        361          9
#> # ℹ 2 more variables: has_electricity <lgl>, has_water <lgl>
table(tanzania_schools$school_level, tanzania_schools$ownership)
#>            
#>             Government Private
#>   Primary         1500     377
#>   Secondary        509     127

# Two-stage cluster sample
if (FALSE) { # \dontrun{
sampling_design() |>
  stage(label = "Schools") |>
    stratify_by(school_level) |>
    cluster_by(school_id) |>
    draw(n = 30, method = "pps_brewer", mos = enrollment) |>
  stage(label = "Students") |>
    draw(n = 25) |>
  execute(tanzania_schools, seed = 42)
} # }