A synthetic business establishment frame inspired by World Bank Enterprise Surveys. Uses real Nigeria states and geopolitical zones but contains entirely fictional data.
nigeria_businessA tibble with approximately 10,000 rows and 7 columns:
Character. Unique business identifier
Factor. Geopolitical zone (North Central, North East, North West, South East, South South, South West)
Factor. State name (36 states + FCT)
Factor. Business sector (Manufacturing, Retail Trade, Wholesale Trade, Services, Construction, Transport, Hospitality)
Factor. Size classification (Micro: 1-4, Small: 5-19, Medium: 20-99, Large: 100+)
Integer. Number of employees (measure of size)
Numeric. Annual turnover in Naira
This dataset is designed for demonstrating:
Business/enterprise surveys
Stratification by sector and size class
PPS sampling using employment
Geographic stratification by zone/state
The distribution reflects typical business demographics with majority micro/small enterprises, concentrated in South West (especially Lagos).
This is a synthetic dataset. States and zones are real but all data values are fictional.
# Explore the data
head(nigeria_business)
#> # A tibble: 6 × 7
#> enterprise_id zone state sector size_class employees annual_turnover
#> <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 NG_01_00001 North Central Benue Transp… Micro 2 5148000
#> 2 NG_01_00002 North Central Benue Wholes… Micro 1 3387000
#> 3 NG_01_00003 North Central Benue Manufa… Small 13 44715000
#> 4 NG_01_00004 North Central Benue Manufa… Micro 3 7248000
#> 5 NG_01_00005 North Central Benue Retail… Micro 2 11436000
#> 6 NG_01_00006 North Central Benue Retail… Micro 1 3361000
table(nigeria_business$size_class)
#>
#> Micro Small Medium Large
#> 6445 3231 1366 575
table(nigeria_business$sector)
#>
#> Construction Hospitality Manufacturing Retail Trade Services
#> 959 1199 1309 3483 2536
#> Transport Wholesale Trade
#> 937 1194
# Stratified sample by sector and size class
sampling_design() |>
stratify_by(sector, size_class) |>
draw(n = 3) |>
execute(nigeria_business, seed = 42)
#> == tbl_sample ==
#> Weights: 15.67 - 655.33 (mean: 138.3 )
#>
#> # A tibble: 84 × 12
#> sector size_class enterprise_id zone state employees annual_turnover .weight
#> * <fct> <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 Const… Micro NG_32_00738 Sout… Ekiti 2 2535000 180
#> 2 Const… Micro NG_22_00327 Sout… Anam… 2 5446000 180
#> 3 Const… Micro NG_13_00086 Nort… Yobe 3 4232000 180
#> 4 Const… Small NG_35_00504 Sout… Ondo 11 18611000 87
#> 5 Const… Small NG_32_00432 Sout… Ekiti 15 24385000 87
#> 6 Const… Small NG_30_00088 Sout… Edo 12 20757000 87
#> 7 Const… Medium NG_26_00057 Sout… Akwa… 96 172466000 37
#> 8 Const… Medium NG_24_00215 Sout… Enugu 47 126948000 37
#> 9 Const… Medium NG_16_00154 Nort… Kano 94 125757000 37
#> 10 Const… Large NG_12_00049 Nort… Tara… 1901 3510676000 15.7
#> # ℹ 74 more rows
#> # ℹ 4 more variables: .sample_id <int>, .stage <int>, .weight_1 <dbl>,
#> # .fpc_1 <int>
# Disproportionate sampling: oversample large enterprises
sampling_design() |>
stratify_by(size_class) |>
draw(frac = c(Micro = 0.005, Small = 0.02, Medium = 0.10, Large = 0.50)) |>
execute(nigeria_business, seed = 42)
#> == tbl_sample ==
#> Weights: 2 - 195.3 (mean: 22.21 )
#>
#> # A tibble: 523 × 12
#> size_class enterprise_id zone state sector employees annual_turnover .weight
#> * <fct> <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 Micro NG_27_00194 Sout… Baye… Manuf… 3 7576000 195.
#> 2 Micro NG_33_00528 Sout… Lagos Hospi… 2 4142000 195.
#> 3 Micro NG_26_00035 Sout… Akwa… Servi… 4 7047000 195.
#> 4 Micro NG_34_00596 Sout… Ogun Retai… 4 9600000 195.
#> 5 Micro NG_15_00004 Nort… Kadu… Retai… 3 7316000 195.
#> 6 Micro NG_17_00008 Nort… Kats… Trans… 2 5477000 195.
#> 7 Micro NG_06_00045 Nort… Plat… Hospi… 4 10406000 195.
#> 8 Micro NG_24_00141 Sout… Enugu Trans… 2 3140000 195.
#> 9 Micro NG_34_00549 Sout… Ogun Manuf… 3 5295000 195.
#> 10 Micro NG_35_00120 Sout… Ondo Hospi… 1 2816000 195.
#> # ℹ 513 more rows
#> # ℹ 4 more variables: .sample_id <int>, .stage <int>, .weight_1 <dbl>,
#> # .fpc_1 <int>