Skip to contents

A synthetic business establishment frame inspired by the KNBS 2017 Census of Establishments and the World Bank Enterprise Survey (WBES) 2018 Kenya design. Covers 47 counties, 11 regions, 7 sectors, and 3 size classes.

Usage

ken_enterprises

Format

A tibble with 6,823 rows and 9 columns:

enterprise_id

Character. Unique establishment identifier

county

Factor. County name (47 counties)

region

Factor. Region (11 regions: 10 WBES regions + Rest of Kenya)

sector

Factor. Business sector (7 sectors)

size_class

Factor. Size classification (Small: 5-19, Medium: 20-99, Large: 100+)

employees

Integer. Number of employees (measure of size)

revenue_millions

Numeric. Annual revenue in millions KES

year_established

Integer. Year the enterprise was established

exporter

Logical. Whether the enterprise exports

Details

This dataset is designed for demonstrating:

  • Enterprise/business surveys

  • Stratification by sector and size class

  • PPS sampling using employment or revenue

  • Disproportionate sampling (oversampling large enterprises)

  • PRN-based sample coordination across survey waves

  • Panel partitioning with execute(..., panels = k)

  • Bernoulli and Poisson sampling

Examples

# Explore the data
head(ken_enterprises)
#> # A tibble: 6 × 9
#>   enterprise_id county region sector       size_class employees revenue_millions
#>   <chr>         <fct>  <fct>  <fct>        <fct>          <int>            <dbl>
#> 1 KEN_00001     Kiambu Kiambu Chemicals &… Small             13             25.1
#> 2 KEN_00002     Kiambu Kiambu Chemicals &… Medium            54             57.8
#> 3 KEN_00003     Kiambu Kiambu Chemicals &… Medium            37             51.7
#> 4 KEN_00004     Kiambu Kiambu Chemicals &… Medium            24             40.5
#> 5 KEN_00005     Kiambu Kiambu Chemicals &… Large            107             73.7
#> 6 KEN_00006     Kiambu Kiambu Chemicals &… Large            136            508. 
#> # ℹ 2 more variables: year_established <int>, exporter <lgl>
table(ken_enterprises$size_class)
#> 
#>  Small Medium  Large 
#>   3290   2687    846 
table(ken_enterprises$sector)
#> 
#>     Food & Beverages  Textiles & Garments Chemicals & Plastics 
#>                  464                  160                  210 
#>  Other Manufacturing               Retail              Tourism 
#>                  503                 1921                  664 
#>       Other Services 
#>                 2901 

# Stratified sample by sector and size class
sampling_design() |>
  stratify_by(sector, size_class) |>
  draw(n = 3) |>
  execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 63 × 14
#> # Weights:      108.3 [13.33, 513]
#>    enterprise_id county      region sector size_class employees revenue_millions
#>  * <chr>         <fct>       <fct>  <fct>  <fct>          <int>            <dbl>
#>  1 KEN_03741     Turkana     Rest … Chemi… Small             14             21.5
#>  2 KEN_03709     Migori      Rest … Chemi… Small             15             16.4
#>  3 KEN_00001     Kiambu      Kiambu Chemi… Small             13             25.1
#>  4 KEN_01042     Nairobi     Nairo… Chemi… Medium            24             43.1
#>  5 KEN_03719     Narok       Rest … Chemi… Medium            67             48.2
#>  6 KEN_01035     Nairobi     Nairo… Chemi… Medium            40            153. 
#>  7 KEN_03688     Kitui       Rest … Chemi… Large            375            266. 
#>  8 KEN_03683     Kisii       Rest … Chemi… Large            664           1025. 
#>  9 KEN_01074     Nairobi     Nairo… Chemi… Large            195            268. 
#> 10 KEN_06522     Trans Nzoia Trans… Food … Small             14             11.8
#> # ℹ 53 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Disproportionate sampling: oversample large enterprises
sampling_design() |>
  stratify_by(size_class) |>
  draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
  execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 758 × 14
#> # Weights:      9 [2, 49.85]
#>    enterprise_id county      region sector size_class employees revenue_millions
#>  * <chr>         <fct>       <fct>  <fct>  <fct>          <int>            <dbl>
#>  1 KEN_05526     Homa Bay    Rest … Retail Small              8             18.8
#>  2 KEN_05090     Nyamira     Rest … Other… Small              6              4.7
#>  3 KEN_02534     Nairobi     Nairo… Retail Small             14             38.4
#>  4 KEN_02455     Nairobi     Nairo… Retail Small             11              7.5
#>  5 KEN_02609     Nairobi     Nairo… Retail Small              5              8.4
#>  6 KEN_06669     Uasin Gishu Uasin… Other… Small             11             16.1
#>  7 KEN_01498     Nairobi     Nairo… Other… Small              8             13.4
#>  8 KEN_04590     Kisii       Rest … Other… Small             10             11.2
#>  9 KEN_02509     Nairobi     Nairo… Retail Small              8             19.3
#> 10 KEN_02684     Nairobi     Nairo… Retail Small             16             23.7
#> # ℹ 748 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>