A synthetic business establishment frame covering 47 counties, 6 regions, 7 sectors, and 3 size classes. Population structure calibrated to the Republic of Kenya 2025 World Bank Enterprise Survey universe table (17,004 KRA-registered establishments) with county-level disaggregation from the KNBS 2017 Census of Establishments.
Format
A tibble with 17,004 rows and 9 columns:
- enterprise_id
Character. Unique establishment identifier
- county
Factor. County name (47 counties)
- region
Factor. Region (6 regions: Central, Coast, East and Northeastern, Nairobi, Nyanza and Western, Rift Valley)
- sector
Factor. Business sector (7 sectors: Food, Chemicals & Chemical Products, Other Manufacturing, Construction, Retail, Hotels and Restaurants, Other Services)
- size_class
Factor. Size classification (Small: 5-19, Medium: 20-99, Large: 100+)
- employees
Integer. Number of employees (measure of size)
- revenue_millions
Numeric. Annual revenue in millions KES
- year_established
Integer. Year the enterprise was established
- exporter
Logical. Whether the enterprise exports
Details
This dataset is designed for demonstrating:
Enterprise/business surveys
Stratification by sector and size class
PPS sampling using employment or revenue
Disproportionate sampling (oversampling large enterprises)
PRN-based sample coordination across survey waves
Panel partitioning with
execute(..., panels = k)Bernoulli and Poisson sampling
Examples
# Explore the data
head(ken_enterprises)
#> # A tibble: 6 × 9
#> enterprise_id county region sector size_class employees revenue_millions
#> <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_00001 Kiambu Central Chemica… Large 179 266.
#> 2 KEN_00002 Kiambu Central Chemica… Medium 43 108
#> 3 KEN_00003 Kiambu Central Chemica… Small 13 25.6
#> 4 KEN_00004 Kiambu Central Chemica… Small 9 32.9
#> 5 KEN_00005 Kiambu Central Chemica… Small 15 45.7
#> 6 KEN_00006 Kirinyaga Central Chemica… Small 8 15
#> # ℹ 2 more variables: year_established <int>, exporter <lgl>
table(ken_enterprises$size_class)
#>
#> Small Medium Large
#> 11100 4610 1294
table(ken_enterprises$sector)
#>
#> Food Chemicals & Chemical Products
#> 856 282
#> Other Manufacturing Construction
#> 2113 2343
#> Retail Hotels and Restaurants
#> 2158 3065
#> Other Services
#> 6187
# Stratified sample by sector and size class
sampling_design() |>
stratify_by(sector, size_class) |>
draw(n = 3) |>
execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 63 × 14
#> # Weights: 269.9 [14.33, 1473.33]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_04216 Nairobi Nairobi Chemi… Large 2000 1334.
#> 2 KEN_00001 Kiambu Central Chemi… Large 179 266.
#> 3 KEN_04204 Nairobi Nairobi Chemi… Large 410 611
#> 4 KEN_04277 Nairobi Nairobi Chemi… Medium 37 94
#> 5 KEN_04303 Nairobi Nairobi Chemi… Medium 37 82.6
#> 6 KEN_04221 Nairobi Nairobi Chemi… Medium 57 83.5
#> 7 KEN_04411 Nairobi Nairobi Chemi… Small 14 24.7
#> 8 KEN_04338 Nairobi Nairobi Chemi… Small 6 5.5
#> 9 KEN_15177 Baringo Rift Vall… Chemi… Small 11 61.4
#> 10 KEN_04449 Nairobi Nairobi Const… Large 1281 2142.
#> # ℹ 53 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Disproportionate sampling: oversample large enterprises
sampling_design() |>
stratify_by(size_class) |>
draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
execute(ken_enterprises, seed = 1960)
#> # A tbl_sample: 1330 × 14
#> # Weights: 12.78 [2, 50]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_07903 Nairobi Nairobi Other … Large 401 773.
#> 2 KEN_09225 Nairobi Nairobi Other … Large 606 610
#> 3 KEN_00608 Kiambu Central Other … Large 323 297.
#> 4 KEN_02072 Mombasa Coast Other … Large 101 214.
#> 5 KEN_04451 Nairobi Nairobi Constr… Large 1931 4759.
#> 6 KEN_05997 Nairobi Nairobi Food Large 1408 1572.
#> 7 KEN_02575 Mombasa Coast Other … Large 257 173.
#> 8 KEN_01568 Kwale Coast Hotels… Large 136 63.5
#> 9 KEN_00311 Nyandarua Central Food Large 984 5880.
#> 10 KEN_09182 Nairobi Nairobi Other … Large 2000 2602.
#> # ℹ 1,320 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>