A synthetic school survey frame inspired by education census and survey data. Uses real Tanzania regions and districts but contains entirely fictional data.
tanzania_schoolsA tibble with approximately 2,500 rows and 9 columns:
Character. Unique school identifier
Factor. Region name (7 regions)
Factor. District name
Factor. Primary or Secondary
Factor. Government or Private
Integer. Total student enrollment (measure of size)
Integer. Number of teachers
Logical. Whether school has electricity
Logical. Whether school has water supply
This dataset is designed for demonstrating:
Education surveys
Two-stage sampling (schools then students)
PPS sampling using enrollment
Stratification by school level and ownership
The dataset reflects typical East African education system characteristics with more primary than secondary schools, and infrastructure varying by urban/rural location.
This is a synthetic dataset. Regions and districts are real but all data values are fictional.
data(tanzania_schools)
head(tanzania_schools)
#> # A tibble: 6 × 9
#> school_id region district school_level ownership enrollment n_teachers
#> <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_Aru_Aru_0001 Arusha Arusha Ci… Primary Governme… 619 13
#> 2 TZ_Aru_Aru_0002 Arusha Arusha Ci… Primary Governme… 1908 40
#> 3 TZ_Aru_Aru_0003 Arusha Arusha Ci… Secondary Governme… 681 14
#> 4 TZ_Aru_Aru_0004 Arusha Arusha Ci… Primary Governme… 525 14
#> 5 TZ_Aru_Aru_0005 Arusha Arusha Ci… Secondary Private 151 3
#> 6 TZ_Aru_Aru_0006 Arusha Arusha Ci… Primary Governme… 361 9
#> # ℹ 2 more variables: has_electricity <lgl>, has_water <lgl>
table(tanzania_schools$school_level, tanzania_schools$ownership)
#>
#> Government Private
#> Primary 1500 377
#> Secondary 509 127
# Two-stage cluster sample
if (FALSE) { # \dontrun{
sampling_design() |>
stage(label = "Schools") |>
stratify_by(school_level) |>
cluster_by(school_id) |>
draw(n = 30, method = "pps_brewer", mos = enrollment) |>
stage(label = "Students") |>
draw(n = 25) |>
execute(tanzania_schools, seed = 42)
} # }