serp() implements hierarchic serpentine sorting (also called "snake" sorting),
transforming a multi-dimensional hierarchy into a one-dimensional path that
preserves spatial contiguity. This is the algorithm used by SAS PROC SURVEYSELECT
with SORT=SERP.
Serpentine sorting alternates direction at each hierarchy level:
First variable: ascending
Second variable: ascending in odd groups of first, descending in even groups
Third variable: alternates based on combined grouping of first two
And so on...
This provides implicit stratification when combined with systematic or sequential sampling, ensuring samples spread evenly across geographic/administrative hierarchies.
Arguments
- ...
Columns to sort by, in hierarchical order (e.g., region, district, commune). Used inside
dplyr::arrange(), similar todplyr::desc().
Value
A numeric vector (sort key) for use by dplyr::arrange().
Details
Algorithm
The algorithm builds a composite sort key by:
Converting each variable to integer ranks
For variable i, determining group membership from variables 1..(i-1)
If the cumulative group number is even, flipping ranks (descending)
Using multi-column ordering to produce final sort positions
References
Chromy, J. R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.
Chromy, J. R., & Williams, R. L. (1980). SAS sample selection macros. Proceedings of the Fifth Annual SAS Users Group International, 392-396.
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
# Basic serpentine sorting with mtcars
mtcars |>
arrange(serp(cyl, gear, carb)) |>
select(cyl, gear, carb) |>
head(15)
#> cyl gear carb
#> Toyota Corona 4 3 1
#> Merc 240D 4 4 2
#> Merc 230 4 4 2
#> Honda Civic 4 4 2
#> Volvo 142E 4 4 2
#> Datsun 710 4 4 1
#> Fiat 128 4 4 1
#> Toyota Corolla 4 4 1
#> Fiat X1-9 4 4 1
#> Porsche 914-2 4 5 2
#> Lotus Europa 4 5 2
#> Ferrari Dino 6 5 6
#> Mazda RX4 6 4 4
#> Mazda RX4 Wag 6 4 4
#> Merc 280 6 4 4
# Compare nested vs serpentine sorting
# Nested: gear always ascending within cyl
mtcars |>
arrange(cyl, gear) |>
select(cyl, gear) |>
head(12)
#> cyl gear
#> Toyota Corona 4 3
#> Datsun 710 4 4
#> Merc 240D 4 4
#> Merc 230 4 4
#> Fiat 128 4 4
#> Honda Civic 4 4
#> Toyota Corolla 4 4
#> Fiat X1-9 4 4
#> Volvo 142E 4 4
#> Porsche 914-2 4 5
#> Lotus Europa 4 5
#> Hornet 4 Drive 6 3
# Serpentine: gear direction alternates by cyl group
mtcars |>
arrange(serp(cyl, gear)) |>
select(cyl, gear) |>
head(12)
#> cyl gear
#> Toyota Corona 4 3
#> Datsun 710 4 4
#> Merc 240D 4 4
#> Merc 230 4 4
#> Fiat 128 4 4
#> Honda Civic 4 4
#> Toyota Corolla 4 4
#> Fiat X1-9 4 4
#> Volvo 142E 4 4
#> Porsche 914-2 4 5
#> Lotus Europa 4 5
#> Ferrari Dino 6 5
# Implicit stratification with systematic sampling
# Sort BFA EAs in serpentine order, then draw systematic sample
sampling_design() |>
draw(n = 100, method = "systematic") |>
execute(arrange(bfa_eas, serp(region, province)),
seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights: 149.34 [149.34, 149.34]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00461 Boucle … Bale Bana Rural 1348 176 7.3
#> 2 EA_11130 Boucle … Bale Poura Urban 1660 267 19.0
#> 3 EA_12367 Boucle … Banwa Solenzo Rural 984 132 23.2
#> 4 EA_12925 Boucle … Banwa Tansila Rural 824 99 18.1
#> 5 EA_03758 Boucle … Kossi Dokui Rural 1277 158 25.8
#> 6 EA_12543 Boucle … Kossi Sono Rural 1246 188 78.8
#> 7 EA_03215 Boucle … Mouhoun Dedoug… Rural 1320 234 0.88
#> 8 EA_12950 Boucle … Mouhoun Tcheri… Rural 1188 190 22.3
#> 9 EA_14040 Boucle … Nayala Yaba Rural 1351 178 6.95
#> 10 EA_07211 Boucle … Sourou Lanfie… Rural 972 131 17.6
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Combine explicit stratification with serpentine sorting
# Stratify by urban/rural, use serpentine within strata
sampling_design() |>
stratify_by(urban_rural) |>
draw(n = 100, method = "systematic") |>
execute(arrange(bfa_eas, urban_rural, serp(region, province)),
seed = 1234)
#> # A tbl_sample: 200 × 17
#> # Weights: 74.67 [26.56, 122.78]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00258 Boucle … Bale Bagassi Rural 1744 210 30.2
#> 2 EA_10360 Boucle … Bale Ouri Rural 909 116 56.5
#> 3 EA_06883 Boucle … Banwa Kouka Rural 3778 520 3.12
#> 4 EA_12420 Boucle … Banwa Solenzo Rural 79 11 5.68
#> 5 EA_00756 Boucle … Kossi Barani Rural 1121 147 9.28
#> 6 EA_03757 Boucle … Kossi Dokui Rural 1078 134 24.4
#> 7 EA_08706 Boucle … Kossi Nouna Rural 1634 241 7.99
#> 8 EA_03162 Boucle … Mouhoun Dedoug… Rural 1393 247 12.1
#> 9 EA_10171 Boucle … Mouhoun Ouarko… Rural 1044 143 35.8
#> 10 EA_04538 Boucle … Nayala Gassan Rural 1900 210 124.
#> # ℹ 190 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>