Synthetic Kenya Establishment Frame

A synthetic establishment frame covering 47 counties, 6 regions, 7 sectors, and 3 size classes. Region-by-sector-by-size counts reproduce the Republic of Kenya 2025 World Bank Enterprise Survey universe table (17,004 eligible KRA-registered establishments). County assignments use reproducible synthetic weights within each survey region. Every row, identifier, county allocation, and establishment attribute is synthetic.

Usage

ken_enterprises

Format

A tibble with 17,004 rows and 9 columns:

enterprise_id: Character. Synthetic unique establishment identifier
county: Factor. Synthetically allocated county name (47 counties)
region: Factor. Region (6 regions: Central, Coast, East and Northeastern, Nairobi, Nyanza and Western, Rift Valley)
sector: Factor. Business sector (7 sectors: Food, Chemicals & Chemical Products, Other Manufacturing, Construction, Retail, Hotels and Restaurants, Other Services)
size_class: Factor. Size classification (Small: 5-19, Medium: 20-99, Large: 100+)
employees: Integer. Simulated number of employees and measure of size
revenue_millions: Numeric. Simulated annual revenue in millions of Kenyan shillings (KES)
year_established: Integer. Simulated year of establishment
exporter: Logical. Simulated export status

Source

World Bank Group, Republic of Kenya World Bank Enterprise Survey 2025, reference KEN_2025_WBES_v01_M, https://microdata.worldbank.org/catalog/8150.

Details

This dataset is designed for demonstrating:

Establishment surveys
Stratification by sector and size class
PPS sampling using employment or revenue
Disproportionate sampling (oversampling large enterprises)
PRN-based sample coordination across survey waves
Panel partitioning with execute(..., panels = k)
Bernoulli and Poisson sampling

These records do not represent actual Kenyan establishments and must not be used to produce substantive business statistics. The dataset is intended only for sampling, planning, and coordination examples. In particular, county totals are illustrative and do not reproduce official establishment counts.

Examples

# Explore the data
head(ken_enterprises)
#> # A tibble: 6 × 9
#>   enterprise_id county    region  sector   size_class employees revenue_millions
#>   <chr>         <fct>     <fct>   <fct>    <fct>          <int>            <dbl>
#> 1 KEN_00001     Kiambu    Central Chemica… Medium            43            108  
#> 2 KEN_00002     Kiambu    Central Chemica… Small             15             45.7
#> 3 KEN_00003     Kiambu    Central Chemica… Small              6              4.4
#> 4 KEN_00004     Kirinyaga Central Chemica… Small             17             40  
#> 5 KEN_00005     Murang'a  Central Chemica… Medium            44             67.3
#> 6 KEN_00006     Murang'a  Central Chemica… Small             13             25.6
#> # ℹ 2 more variables: year_established <int>, exporter <lgl>
table(ken_enterprises$size_class)
#> 
#>  Small Medium  Large 
#>  11100   4610   1294 
table(ken_enterprises$sector)
#> 
#>                          Food Chemicals & Chemical Products 
#>                           856                           282 
#>           Other Manufacturing                  Construction 
#>                          2113                          2343 
#>                        Retail        Hotels and Restaurants 
#>                          2158                          3065 
#>                Other Services 
#>                          6187 

# Stratified sample by sector and size class
sampling_design() |>
  stratify_by(sector, size_class) |>
  draw(n = 3) |>
  execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 63 × 14
#> # Sampling:     1 stage | 63/17,004 units
#> # Weights:      269.9 [14.33, 1473.33]
#>    enterprise_id county  region  sector    size_class employees revenue_millions
#>  * <chr>         <fct>   <fct>   <fct>     <fct>          <int>            <dbl>
#>  1 KEN_04252     Nairobi Nairobi Chemical… Medium            53             48.3
#>  2 KEN_04304     Nairobi Nairobi Chemical… Medium            56            222. 
#>  3 KEN_04268     Nairobi Nairobi Chemical… Medium            23            131. 
#>  4 KEN_04363     Nairobi Nairobi Chemical… Small              6             13.3
#>  5 KEN_04411     Nairobi Nairobi Chemical… Small             14             24.7
#>  6 KEN_04338     Nairobi Nairobi Chemical… Small              6              5.5
#>  7 KEN_04203     Nairobi Nairobi Chemical… Large            671           1728. 
#>  8 KEN_04186     Nairobi Nairobi Chemical… Large           1828           2338. 
#>  9 KEN_04215     Nairobi Nairobi Chemical… Large            285            632. 
#> 10 KEN_04439     Nairobi Nairobi Construc… Large            120             95.5
#> # ℹ 53 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <dbl>

# Disproportionate sampling: oversample large enterprises
sampling_design() |>
  stratify_by(size_class) |>
  draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
  execute(ken_enterprises, seed = 1960)
#> # A tbl_sample: 1330 × 14
#> # Sampling:     1 stage | 1,330/17,004 units
#> # Weights:      12.78 [2, 50]
#>    enterprise_id county      region sector size_class employees revenue_millions
#>  * <chr>         <fct>       <fct>  <fct>  <fct>          <int>            <dbl>
#>  1 KEN_13275     Nairobi     Nairo… Retail Medium            63            234. 
#>  2 KEN_13322     Nairobi     Nairo… Retail Medium            49            106. 
#>  3 KEN_00210     Kirinyaga   Centr… Food   Medium            80            151. 
#>  4 KEN_14521     Siaya       Nyanz… Const… Medium            61            228. 
#>  5 KEN_00675     Murang'a    Centr… Other… Medium            44             20.9
#>  6 KEN_16526     Narok       Rift … Other… Medium            51             38.1
#>  7 KEN_14842     Vihiga      Nyanz… Other… Medium            31             40.2
#>  8 KEN_15934     Trans Nzoia Rift … Hotel… Medium            40             39.3
#>  9 KEN_14595     Bungoma     Nyanz… Hotel… Medium            38             17.3
#> 10 KEN_00140     Nyeri       Centr… Const… Medium            61            248. 
#> # ℹ 1,320 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <dbl>

Usage

Format

Source

Details

See also

Examples