Skip to contents

A synthetic business establishment frame covering 47 counties, 6 regions, 7 sectors, and 3 size classes. Population structure calibrated to the Republic of Kenya 2025 World Bank Enterprise Survey universe table (17,004 KRA-registered establishments) with county-level disaggregation from the KNBS 2017 Census of Establishments.

Usage

ken_enterprises

Format

A tibble with 17,004 rows and 9 columns:

enterprise_id

Character. Unique establishment identifier

county

Factor. County name (47 counties)

region

Factor. Region (6 regions: Central, Coast, East and Northeastern, Nairobi, Nyanza and Western, Rift Valley)

sector

Factor. Business sector (7 sectors: Food, Chemicals & Chemical Products, Other Manufacturing, Construction, Retail, Hotels and Restaurants, Other Services)

size_class

Factor. Size classification (Small: 5-19, Medium: 20-99, Large: 100+)

employees

Integer. Number of employees (measure of size)

revenue_millions

Numeric. Annual revenue in millions KES

year_established

Integer. Year the enterprise was established

exporter

Logical. Whether the enterprise exports

Details

This dataset is designed for demonstrating:

  • Enterprise/business surveys

  • Stratification by sector and size class

  • PPS sampling using employment or revenue

  • Disproportionate sampling (oversampling large enterprises)

  • PRN-based sample coordination across survey waves

  • Panel partitioning with execute(..., panels = k)

  • Bernoulli and Poisson sampling

Examples

# Explore the data
head(ken_enterprises)
#> # A tibble: 6 × 9
#>   enterprise_id county    region  sector   size_class employees revenue_millions
#>   <chr>         <fct>     <fct>   <fct>    <fct>          <int>            <dbl>
#> 1 KEN_00001     Kiambu    Central Chemica… Large            179            266. 
#> 2 KEN_00002     Kiambu    Central Chemica… Medium            43            108  
#> 3 KEN_00003     Kiambu    Central Chemica… Small             13             25.6
#> 4 KEN_00004     Kiambu    Central Chemica… Small              9             32.9
#> 5 KEN_00005     Kiambu    Central Chemica… Small             15             45.7
#> 6 KEN_00006     Kirinyaga Central Chemica… Small              8             15  
#> # ℹ 2 more variables: year_established <int>, exporter <lgl>
table(ken_enterprises$size_class)
#> 
#>  Small Medium  Large 
#>  11100   4610   1294 
table(ken_enterprises$sector)
#> 
#>                          Food Chemicals & Chemical Products 
#>                           856                           282 
#>           Other Manufacturing                  Construction 
#>                          2113                          2343 
#>                        Retail        Hotels and Restaurants 
#>                          2158                          3065 
#>                Other Services 
#>                          6187 

# Stratified sample by sector and size class
sampling_design() |>
  stratify_by(sector, size_class) |>
  draw(n = 3) |>
  execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 63 × 14
#> # Weights:      269.9 [14.33, 1473.33]
#>    enterprise_id county  region     sector size_class employees revenue_millions
#>  * <chr>         <fct>   <fct>      <fct>  <fct>          <int>            <dbl>
#>  1 KEN_04216     Nairobi Nairobi    Chemi… Large           2000           1334. 
#>  2 KEN_00001     Kiambu  Central    Chemi… Large            179            266. 
#>  3 KEN_04204     Nairobi Nairobi    Chemi… Large            410            611  
#>  4 KEN_04277     Nairobi Nairobi    Chemi… Medium            37             94  
#>  5 KEN_04303     Nairobi Nairobi    Chemi… Medium            37             82.6
#>  6 KEN_04221     Nairobi Nairobi    Chemi… Medium            57             83.5
#>  7 KEN_04411     Nairobi Nairobi    Chemi… Small             14             24.7
#>  8 KEN_04338     Nairobi Nairobi    Chemi… Small              6              5.5
#>  9 KEN_15177     Baringo Rift Vall… Chemi… Small             11             61.4
#> 10 KEN_04449     Nairobi Nairobi    Const… Large           1281           2142. 
#> # ℹ 53 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Disproportionate sampling: oversample large enterprises
sampling_design() |>
  stratify_by(size_class) |>
  draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
  execute(ken_enterprises, seed = 1960)
#> # A tbl_sample: 1330 × 14
#> # Weights:      12.78 [2, 50]
#>    enterprise_id county    region  sector  size_class employees revenue_millions
#>  * <chr>         <fct>     <fct>   <fct>   <fct>          <int>            <dbl>
#>  1 KEN_07903     Nairobi   Nairobi Other … Large            401            773. 
#>  2 KEN_09225     Nairobi   Nairobi Other … Large            606            610  
#>  3 KEN_00608     Kiambu    Central Other … Large            323            297. 
#>  4 KEN_02072     Mombasa   Coast   Other … Large            101            214. 
#>  5 KEN_04451     Nairobi   Nairobi Constr… Large           1931           4759. 
#>  6 KEN_05997     Nairobi   Nairobi Food    Large           1408           1572. 
#>  7 KEN_02575     Mombasa   Coast   Other … Large            257            173. 
#>  8 KEN_01568     Kwale     Coast   Hotels… Large            136             63.5
#>  9 KEN_00311     Nyandarua Central Food    Large            984           5880. 
#> 10 KEN_09182     Nairobi   Nairobi Other … Large           2000           2602. 
#> # ℹ 1,320 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>