serp() implements hierarchic serpentine sorting (also called "snake" sorting),
transforming a multi-dimensional hierarchy into a one-dimensional path that
preserves spatial contiguity. This is the algorithm used by SAS PROC SURVEYSELECT
with SORT=SERP.
Serpentine sorting alternates direction at each hierarchy level:
First variable: ascending
Second variable: ascending in odd groups of first, descending in even groups
Third variable: alternates based on combined grouping of first two
And so on...
This provides implicit stratification when combined with systematic or sequential sampling, ensuring samples spread evenly across geographic/administrative hierarchies.
serp(...)Columns to sort by, in hierarchical order (e.g., region, district,
commune). Used inside dplyr::arrange(), similar to dplyr::desc().
A numeric vector (sort key) for use by dplyr::arrange().
The algorithm builds a composite sort key by:
Converting each variable to integer ranks
For variable i, determining group membership from variables 1..(i-1)
If the cumulative group number is even, flipping ranks (descending)
Using multi-column ordering to produce final sort positions
Chromy, J. R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.
Williams, R. L., & Chromy, J. R. (1980). SAS sample selection macros. Proceedings of the Fifth Annual SAS Users Group International, 392-396.
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
# Basic serpentine sorting with mtcars
mtcars |>
arrange(serp(cyl, gear, carb)) |>
select(cyl, gear, carb) |>
head(15)
#> cyl gear carb
#> Toyota Corona 4 3 1
#> Merc 240D 4 4 2
#> Merc 230 4 4 2
#> Honda Civic 4 4 2
#> Volvo 142E 4 4 2
#> Datsun 710 4 4 1
#> Fiat 128 4 4 1
#> Toyota Corolla 4 4 1
#> Fiat X1-9 4 4 1
#> Porsche 914-2 4 5 2
#> Lotus Europa 4 5 2
#> Ferrari Dino 6 5 6
#> Mazda RX4 6 4 4
#> Mazda RX4 Wag 6 4 4
#> Merc 280 6 4 4
# Compare nested vs serpentine sorting
# Nested: gear always ascending within cyl
mtcars |>
arrange(cyl, gear) |>
select(cyl, gear) |>
head(12)
#> cyl gear
#> Toyota Corona 4 3
#> Datsun 710 4 4
#> Merc 240D 4 4
#> Merc 230 4 4
#> Fiat 128 4 4
#> Honda Civic 4 4
#> Toyota Corolla 4 4
#> Fiat X1-9 4 4
#> Volvo 142E 4 4
#> Porsche 914-2 4 5
#> Lotus Europa 4 5
#> Hornet 4 Drive 6 3
# Serpentine: gear direction alternates by cyl group
mtcars |>
arrange(serp(cyl, gear)) |>
select(cyl, gear) |>
head(12)
#> cyl gear
#> Toyota Corona 4 3
#> Datsun 710 4 4
#> Merc 240D 4 4
#> Merc 230 4 4
#> Fiat 128 4 4
#> Honda Civic 4 4
#> Toyota Corolla 4 4
#> Fiat X1-9 4 4
#> Volvo 142E 4 4
#> Porsche 914-2 4 5
#> Lotus Europa 4 5
#> Ferrari Dino 6 5
# Implicit stratification with systematic sampling
# Sort Niger EAs in serpentine order, then draw systematic sample
sampling_design() |>
draw(n = 100, method = "systematic") |>
execute(arrange(niger_eas, serp(region, department)),
seed = 42)
#> == tbl_sample ==
#> Weights: 15.36 - 15.36 (mean: 15.36 )
#>
#> # A tibble: 100 × 11
#> ea_id region department strata hh_count pop_estimate .weight .prob
#> * <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 Aga_02_0002 Agadez Arlit Rural 63 315 15.4 0.0651
#> 2 Aga_03_0003 Agadez Bilma Urban 154 770 15.4 0.0651
#> 3 Aga_04_0008 Agadez Tchirozérine Rural 76 456 15.4 0.0651
#> 4 Dif_07_0010 Diffa N'Guigmi Rural 65 325 15.4 0.0651
#> 5 Dif_06_0008 Diffa Mainé-Soroa Rural 46 322 15.4 0.0651
#> 6 Dif_05_0008 Diffa Diffa Rural 36 288 15.4 0.0651
#> 7 Dif_08_0006 Diffa Bosso Urban 185 1480 15.4 0.0651
#> 8 Dos_10_0006 Dosso Boboye Rural 39 312 15.4 0.0651
#> 9 Dos_10_0021 Dosso Boboye Urban 162 810 15.4 0.0651
#> 10 Dos_11_0002 Dosso Dogondoutchi Rural 43 258 15.4 0.0651
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .sample_id <int>, .stage <int>, .prob_1 <dbl>
# Combine explicit stratification with serpentine sorting
# Stratify by urban/rural, use serpentine within strata
sampling_design() |>
stratify_by(strata) |>
draw(n = 100, method = "systematic") |>
execute(arrange(niger_eas, strata, serp(region, department)),
seed = 1234)
#> == tbl_sample ==
#> Weights: 3.31 - 12.05 (mean: 7.68 )
#>
#> # A tibble: 200 × 11
#> strata ea_id region department hh_count pop_estimate .weight .prob .sample_id
#> * <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 Urban Aga_… Agadez Agadez 157 942 3.31 0.302 1
#> 2 Urban Aga_… Agadez Agadez 54 432 3.31 0.302 2
#> 3 Urban Aga_… Agadez Arlit 128 640 3.31 0.302 3
#> 4 Urban Aga_… Agadez Arlit 142 710 3.31 0.302 4
#> 5 Urban Aga_… Agadez Bilma 154 770 3.31 0.302 5
#> 6 Urban Aga_… Agadez Tchirozér… 352 2464 3.31 0.302 6
#> 7 Urban Aga_… Agadez Tchirozér… 181 1267 3.31 0.302 7
#> 8 Urban Aga_… Agadez Tchirozér… 190 1330 3.31 0.302 8
#> 9 Urban Dif_… Diffa Mainé-Sor… 134 938 3.31 0.302 9
#> 10 Urban Dos_… Dosso Boboye 329 2632 3.31 0.302 10
#> # ℹ 190 more rows
#> # ℹ 2 more variables: .stage <int>, .prob_1 <dbl>