cluster_by() specifies the sampling units (PSUs/clusters) for cluster
or multi-stage sampling designs. Unlike stratify_by(), which defines
subgroups to sample within, cluster_by() defines units to sample
as a whole.
cluster_by(.data, ...)A sampling_design object (piped from sampling_design(),
stratify_by(), or stage()).
<tidy-select> Clustering variable(s)
that identify the sampling units. In most cases this is a single variable
(e.g., school_id, household_id).
A modified sampling_design object with clustering specified.
cluster_by() is purely structural—it defines what to sample, not how.
The selection method and sample size are specified in draw().
Stratification (stratify_by()): Sample within each group; all
groups represented in the sample
Clustering (cluster_by()): Sample groups as units; only selected
groups appear in sample
In multi-stage designs, each stage typically has its own clustering variable:
Stage 1: Select schools (cluster_by(school_id))
Stage 2: Select classrooms within schools (cluster_by(classroom_id))
Stage 3: Select students within classrooms (no clustering, sample individuals)
The nesting structure (classrooms within schools) is validated at execution time.
In a single stage, the typical order is:
stratify_by() (optional) - define strata
cluster_by() (optional) - define sampling units
draw() (required) - specify selection parameters
Both stratify_by() and cluster_by() are optional but draw() is required.
sampling_design() for creating designs,
stratify_by() for stratification,
draw() for specifying selection,
stage() for multi-stage designs
# Simple cluster sample: select 30 schools
sampling_design() |>
cluster_by(school_id) |>
draw(n = 30) |>
execute(tanzania_schools, seed = 123)
#> == tbl_sample ==
#> Weights: 81.13 - 81.13 (mean: 81.13 )
#>
#> # A tibble: 30 × 14
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_05_0094 Arusha Arusha … Primary Private 348 9
#> 2 TZ_06_0010 Arusha Meru Primary Governme… 358 8
#> 3 TZ_07_0068 Arusha Monduli Primary Governme… 477 13
#> 4 TZ_01_0017 Dar es Sala… Ilala Primary Private 213 5
#> 5 TZ_01_0170 Dar es Sala… Ilala Secondary Governme… 106 3
#> 6 TZ_02_0023 Dar es Sala… Kinondo… Primary Governme… 259 6
#> 7 TZ_03_0003 Dar es Sala… Temeke Primary Governme… 520 13
#> 8 TZ_08_0018 Dodoma Dodoma … Secondary Governme… 141 4
#> 9 TZ_08_0066 Dodoma Dodoma … Primary Governme… 276 6
#> 10 TZ_10_0056 Dodoma Kondoa Secondary Governme… 194 5
#> # ℹ 20 more rows
#> # ℹ 7 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified cluster sample: 10 schools per education level
sampling_design() |>
stratify_by(school_level) |>
cluster_by(school_id) |>
draw(n = 10) |>
execute(tanzania_schools, seed = 1)
#> == tbl_sample ==
#> Weights: 65.6 - 177.8 (mean: 121.7 )
#>
#> # A tibble: 20 × 14
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_05_0062 Arusha Arusha … Secondary Governme… 184 5
#> 2 TZ_05_0066 Arusha Arusha … Primary Governme… 596 15
#> 3 TZ_07_0063 Arusha Monduli Primary Governme… 275 8
#> 4 TZ_01_0025 Dar es Sala… Ilala Primary Governme… 384 11
#> 5 TZ_01_0102 Dar es Sala… Ilala Secondary Governme… 605 14
#> 6 TZ_02_0097 Dar es Sala… Kinondo… Primary Governme… 742 21
#> 7 TZ_08_0034 Dodoma Dodoma … Primary Governme… 302 8
#> 8 TZ_11_0031 Dodoma Mpwapwa Secondary Governme… 864 21
#> 9 TZ_13_0058 Kilimanjaro Moshi R… Secondary Private 144 3
#> 10 TZ_12_0018 Kilimanjaro Moshi U… Primary Governme… 350 8
#> 11 TZ_15_0055 Kilimanjaro Rombo Primary Governme… 490 10
#> 12 TZ_15_0078 Kilimanjaro Rombo Secondary Governme… 131 3
#> 13 TZ_24_0063 Morogoro Morogor… Primary Governme… 713 19
#> 14 TZ_17_0057 Mwanza Ilemela Secondary Governme… 215 5
#> 15 TZ_17_0076 Mwanza Ilemela Primary Governme… 799 17
#> 16 TZ_18_0004 Mwanza Magu Secondary Governme… 286 8
#> 17 TZ_19_0056 Mwanza Sengere… Primary Private 770 18
#> 18 TZ_21_0058 Tanga Korogwe Secondary Governme… 290 8
#> 19 TZ_22_0024 Tanga Lushoto Secondary Private 88 3
#> 20 TZ_22_0039 Tanga Lushoto Secondary Governme… 253 5
#> # ℹ 7 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# PPS cluster sample using enrollment as measure of size
sampling_design() |>
cluster_by(school_id) |>
draw(n = 50, method = "pps_brewer", mos = enrollment) |>
execute(tanzania_schools, seed = 2026)
#> == tbl_sample ==
#> Weights: 10.28 - 184.32 (mean: 52.59 )
#>
#> # A tibble: 50 × 15
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_04_0004 Arusha Arusha City Primary Governme… 455 11
#> 2 TZ_04_0014 Arusha Arusha City Primary Governme… 438 9
#> 3 TZ_04_0015 Arusha Arusha City Primary Governme… 576 13
#> 4 TZ_04_0038 Arusha Arusha City Primary Governme… 686 15
#> 5 TZ_04_0041 Arusha Arusha City Secondary Governme… 280 6
#> 6 TZ_04_0061 Arusha Arusha City Primary Governme… 247 7
#> 7 TZ_04_0082 Arusha Arusha City Primary Governme… 706 20
#> 8 TZ_05_0046 Arusha Arusha Distri… Primary Governme… 664 15
#> 9 TZ_05_0092 Arusha Arusha Distri… Primary Governme… 477 13
#> 10 TZ_07_0044 Arusha Monduli Primary Governme… 503 14
#> # ℹ 40 more rows
#> # ℹ 8 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# Two-stage cluster sample
sampling_design() |>
stage(label = "Schools") |>
cluster_by(school_id) |>
draw(n = 30, method = "pps_brewer", mos = enrollment) |>
stage(label = "Students") |>
draw(n = 15) |>
execute(tanzania_schools, seed = 1234)
#> == tbl_sample ==
#> Weights: 26.24 - 281.39 (mean: 83.24 )
#>
#> # A tibble: 30 × 17
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_01_0019 Dar es Sala… Ilala Primary Governme… 381 11
#> 2 TZ_01_0084 Dar es Sala… Ilala Primary Governme… 344 9
#> 3 TZ_01_0163 Dar es Sala… Ilala Primary Governme… 521 14
#> 4 TZ_02_0016 Dar es Sala… Kinondo… Secondary Governme… 119 3
#> 5 TZ_02_0021 Dar es Sala… Kinondo… Secondary Governme… 495 10
#> 6 TZ_02_0103 Dar es Sala… Kinondo… Primary Governme… 636 18
#> 7 TZ_03_0017 Dar es Sala… Temeke Primary Governme… 164 4
#> 8 TZ_03_0022 Dar es Sala… Temeke Primary Governme… 408 11
#> 9 TZ_03_0040 Dar es Sala… Temeke Primary Governme… 801 20
#> 10 TZ_03_0067 Dar es Sala… Temeke Secondary Governme… 217 5
#> # ℹ 20 more rows
#> # ℹ 10 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>