A synthetic business establishment frame inspired by the KNBS 2017 Census of Establishments and the World Bank Enterprise Survey (WBES) 2018 Kenya design. Covers 47 counties, 11 regions, 7 sectors, and 3 size classes.
Format
A tibble with 6,823 rows and 9 columns:
- enterprise_id
Character. Unique establishment identifier
- county
Factor. County name (47 counties)
- region
Factor. Region (11 regions: 10 WBES regions + Rest of Kenya)
- sector
Factor. Business sector (7 sectors)
- size_class
Factor. Size classification (Small: 5-19, Medium: 20-99, Large: 100+)
- employees
Integer. Number of employees (measure of size)
- revenue_millions
Numeric. Annual revenue in millions KES
- year_established
Integer. Year the enterprise was established
- exporter
Logical. Whether the enterprise exports
Details
This dataset is designed for demonstrating:
Enterprise/business surveys
Stratification by sector and size class
PPS sampling using employment or revenue
Disproportionate sampling (oversampling large enterprises)
PRN-based sample coordination across survey waves
Panel partitioning with
execute(..., panels = k)Bernoulli and Poisson sampling
Examples
# Explore the data
head(ken_enterprises)
#> # A tibble: 6 × 9
#> enterprise_id county region sector size_class employees revenue_millions
#> <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_00001 Kiambu Kiambu Chemicals &… Small 13 25.1
#> 2 KEN_00002 Kiambu Kiambu Chemicals &… Medium 54 57.8
#> 3 KEN_00003 Kiambu Kiambu Chemicals &… Medium 37 51.7
#> 4 KEN_00004 Kiambu Kiambu Chemicals &… Medium 24 40.5
#> 5 KEN_00005 Kiambu Kiambu Chemicals &… Large 107 73.7
#> 6 KEN_00006 Kiambu Kiambu Chemicals &… Large 136 508.
#> # ℹ 2 more variables: year_established <int>, exporter <lgl>
table(ken_enterprises$size_class)
#>
#> Small Medium Large
#> 3290 2687 846
table(ken_enterprises$sector)
#>
#> Food & Beverages Textiles & Garments Chemicals & Plastics
#> 464 160 210
#> Other Manufacturing Retail Tourism
#> 503 1921 664
#> Other Services
#> 2901
# Stratified sample by sector and size class
sampling_design() |>
stratify_by(sector, size_class) |>
draw(n = 3) |>
execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 63 × 14
#> # Weights: 108.3 [13.33, 513]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_03741 Turkana Rest … Chemi… Small 14 21.5
#> 2 KEN_03709 Migori Rest … Chemi… Small 15 16.4
#> 3 KEN_00001 Kiambu Kiambu Chemi… Small 13 25.1
#> 4 KEN_01042 Nairobi Nairo… Chemi… Medium 24 43.1
#> 5 KEN_03719 Narok Rest … Chemi… Medium 67 48.2
#> 6 KEN_01035 Nairobi Nairo… Chemi… Medium 40 153.
#> 7 KEN_03688 Kitui Rest … Chemi… Large 375 266.
#> 8 KEN_03683 Kisii Rest … Chemi… Large 664 1025.
#> 9 KEN_01074 Nairobi Nairo… Chemi… Large 195 268.
#> 10 KEN_06522 Trans Nzoia Trans… Food … Small 14 11.8
#> # ℹ 53 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Disproportionate sampling: oversample large enterprises
sampling_design() |>
stratify_by(size_class) |>
draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 758 × 14
#> # Weights: 9 [2, 49.85]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_05526 Homa Bay Rest … Retail Small 8 18.8
#> 2 KEN_05090 Nyamira Rest … Other… Small 6 4.7
#> 3 KEN_02534 Nairobi Nairo… Retail Small 14 38.4
#> 4 KEN_02455 Nairobi Nairo… Retail Small 11 7.5
#> 5 KEN_02609 Nairobi Nairo… Retail Small 5 8.4
#> 6 KEN_06669 Uasin Gishu Uasin… Other… Small 11 16.1
#> 7 KEN_01498 Nairobi Nairo… Other… Small 8 13.4
#> 8 KEN_04590 Kisii Rest … Other… Small 10 11.2
#> 9 KEN_02509 Nairobi Nairo… Retail Small 8 19.3
#> 10 KEN_02684 Nairobi Nairo… Retail Small 16 23.7
#> # ℹ 748 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>