serp() implements hierarchic serpentine sorting (also called "snake" sorting), transforming a multi-dimensional hierarchy into a one-dimensional path that preserves spatial contiguity. This is the algorithm used by SAS PROC SURVEYSELECT with SORT=SERP.

Serpentine sorting alternates direction at each hierarchy level:

  • First variable: ascending

  • Second variable: ascending in odd groups of first, descending in even groups

  • Third variable: alternates based on combined grouping of first two

  • And so on...

This provides implicit stratification when combined with systematic or sequential sampling, ensuring samples spread evenly across geographic/administrative hierarchies.

serp(...)

Arguments

...

Columns to sort by, in hierarchical order (e.g., region, district, commune). Used inside dplyr::arrange(), similar to dplyr::desc().

Value

A numeric vector (sort key) for use by dplyr::arrange().

Details

Algorithm

The algorithm builds a composite sort key by:

  1. Converting each variable to integer ranks

  2. For variable i, determining group membership from variables 1..(i-1)

  3. If the cumulative group number is even, flipping ranks (descending)

  4. Using multi-column ordering to produce final sort positions

Use with Systematic Sampling

Serpentine sorting is particularly effective with systematic sampling. By ordering the frame in a snake-like pattern, a systematic sample automatically spreads across all regions and sub-regions.

Comparison with Nested Sorting

Standard sorting creates large "jumps" at hierarchy boundaries. Serpentine sorting minimizes these by reversing direction—the last district of region 1 is adjacent to the last district of region 2.

References

Chromy, J. R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.

Williams, R. L., & Chromy, J. R. (1980). SAS sample selection macros. Proceedings of the Fifth Annual SAS Users Group International, 392-396.

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union

# Basic serpentine sorting with mtcars
mtcars |>
  arrange(serp(cyl, gear, carb)) |>
  select(cyl, gear, carb) |>
  head(15)
#>                cyl gear carb
#> Toyota Corona    4    3    1
#> Merc 240D        4    4    2
#> Merc 230         4    4    2
#> Honda Civic      4    4    2
#> Volvo 142E       4    4    2
#> Datsun 710       4    4    1
#> Fiat 128         4    4    1
#> Toyota Corolla   4    4    1
#> Fiat X1-9        4    4    1
#> Porsche 914-2    4    5    2
#> Lotus Europa     4    5    2
#> Ferrari Dino     6    5    6
#> Mazda RX4        6    4    4
#> Mazda RX4 Wag    6    4    4
#> Merc 280         6    4    4

# Compare nested vs serpentine sorting
# Nested: gear always ascending within cyl
mtcars |>
  arrange(cyl, gear) |>
  select(cyl, gear) |>
  head(12)
#>                cyl gear
#> Toyota Corona    4    3
#> Datsun 710       4    4
#> Merc 240D        4    4
#> Merc 230         4    4
#> Fiat 128         4    4
#> Honda Civic      4    4
#> Toyota Corolla   4    4
#> Fiat X1-9        4    4
#> Volvo 142E       4    4
#> Porsche 914-2    4    5
#> Lotus Europa     4    5
#> Hornet 4 Drive   6    3

# Serpentine: gear direction alternates by cyl group
mtcars |>
  arrange(serp(cyl, gear)) |>
  select(cyl, gear) |>
  head(12)
#>                cyl gear
#> Toyota Corona    4    3
#> Datsun 710       4    4
#> Merc 240D        4    4
#> Merc 230         4    4
#> Fiat 128         4    4
#> Honda Civic      4    4
#> Toyota Corolla   4    4
#> Fiat X1-9        4    4
#> Volvo 142E       4    4
#> Porsche 914-2    4    5
#> Lotus Europa     4    5
#> Ferrari Dino     6    5

# Implicit stratification with systematic sampling
# Sort Niger EAs in serpentine order, then draw systematic sample
sampling_design() |>
  draw(n = 100, method = "systematic") |>
  execute(arrange(niger_eas, serp(region, department)),
                  seed = 42)
#> == tbl_sample ==
#> Weights: 15.36 - 15.36 (mean: 15.36 )
#> 
#> # A tibble: 100 × 11
#>    ea_id       region department   strata hh_count pop_estimate .weight  .prob
#>  * <chr>       <fct>  <fct>        <fct>     <dbl>        <dbl>   <dbl>  <dbl>
#>  1 Aga_02_0002 Agadez Arlit        Rural        63          315    15.4 0.0651
#>  2 Aga_03_0003 Agadez Bilma        Urban       154          770    15.4 0.0651
#>  3 Aga_04_0008 Agadez Tchirozérine Rural        76          456    15.4 0.0651
#>  4 Dif_07_0010 Diffa  N'Guigmi     Rural        65          325    15.4 0.0651
#>  5 Dif_06_0008 Diffa  Mainé-Soroa  Rural        46          322    15.4 0.0651
#>  6 Dif_05_0008 Diffa  Diffa        Rural        36          288    15.4 0.0651
#>  7 Dif_08_0006 Diffa  Bosso        Urban       185         1480    15.4 0.0651
#>  8 Dos_10_0006 Dosso  Boboye       Rural        39          312    15.4 0.0651
#>  9 Dos_10_0021 Dosso  Boboye       Urban       162          810    15.4 0.0651
#> 10 Dos_11_0002 Dosso  Dogondoutchi Rural        43          258    15.4 0.0651
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .sample_id <int>, .stage <int>, .prob_1 <dbl>

# Combine explicit stratification with serpentine sorting
# Stratify by urban/rural, use serpentine within strata
sampling_design() |>
  stratify_by(strata) |>
  draw(n = 100, method = "systematic") |>
  execute(arrange(niger_eas, strata, serp(region, department)),
                  seed = 1234)
#> == tbl_sample ==
#> Weights: 3.31 - 12.05 (mean: 7.68 )
#> 
#> # A tibble: 200 × 11
#>    strata ea_id region department hh_count pop_estimate .weight .prob .sample_id
#>  * <fct>  <chr> <fct>  <fct>         <dbl>        <dbl>   <dbl> <dbl>      <int>
#>  1 Urban  Aga_… Agadez Agadez          157          942    3.31 0.302          1
#>  2 Urban  Aga_… Agadez Agadez           54          432    3.31 0.302          2
#>  3 Urban  Aga_… Agadez Arlit           128          640    3.31 0.302          3
#>  4 Urban  Aga_… Agadez Arlit           142          710    3.31 0.302          4
#>  5 Urban  Aga_… Agadez Bilma           154          770    3.31 0.302          5
#>  6 Urban  Aga_… Agadez Tchirozér…      352         2464    3.31 0.302          6
#>  7 Urban  Aga_… Agadez Tchirozér…      181         1267    3.31 0.302          7
#>  8 Urban  Aga_… Agadez Tchirozér…      190         1330    3.31 0.302          8
#>  9 Urban  Dif_… Diffa  Mainé-Sor…      134          938    3.31 0.302          9
#> 10 Urban  Dos_… Dosso  Boboye          329         2632    3.31 0.302         10
#> # ℹ 190 more rows
#> # ℹ 2 more variables: .stage <int>, .prob_1 <dbl>