Skip to contents

serp() implements hierarchic serpentine sorting (also called "snake" sorting), transforming a multi-dimensional hierarchy into a one-dimensional path that preserves spatial contiguity. This is the algorithm used by SAS PROC SURVEYSELECT with SORT=SERP.

Serpentine sorting alternates direction at each hierarchy level:

  • First variable: ascending

  • Second variable: ascending in odd groups of first, descending in even groups

  • Third variable: alternates based on combined grouping of first two

  • And so on...

This provides implicit stratification when combined with systematic or sequential sampling, ensuring samples spread evenly across geographic/administrative hierarchies.

Usage

serp(...)

Arguments

...

Columns to sort by, in hierarchical order (e.g., region, district, commune). Used inside dplyr::arrange(), similar to dplyr::desc().

Value

A numeric vector (sort key) for use by dplyr::arrange().

Details

Algorithm

The algorithm builds a composite sort key by:

  1. Converting each variable to integer ranks

  2. For variable i, determining group membership from variables 1..(i-1)

  3. If the cumulative group number is even, flipping ranks (descending)

  4. Using multi-column ordering to produce final sort positions

Use with Systematic Sampling

Serpentine sorting is particularly effective with systematic sampling. By ordering the frame in a snake-like pattern, a systematic sample automatically spreads across all regions and sub-regions.

Comparison with Nested Sorting

Standard sorting creates large "jumps" at hierarchy boundaries. Serpentine sorting minimizes these by reversing direction – the last district of region 1 is adjacent to the last district of region 2.

References

Chromy, J. R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.

Chromy, J. R., & Williams, R. L. (1980). SAS sample selection macros. Proceedings of the Fifth Annual SAS Users Group International, 392-396.

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union

# Basic serpentine sorting with mtcars
mtcars |>
  arrange(serp(cyl, gear, carb)) |>
  select(cyl, gear, carb) |>
  head(15)
#>                cyl gear carb
#> Toyota Corona    4    3    1
#> Merc 240D        4    4    2
#> Merc 230         4    4    2
#> Honda Civic      4    4    2
#> Volvo 142E       4    4    2
#> Datsun 710       4    4    1
#> Fiat 128         4    4    1
#> Toyota Corolla   4    4    1
#> Fiat X1-9        4    4    1
#> Porsche 914-2    4    5    2
#> Lotus Europa     4    5    2
#> Ferrari Dino     6    5    6
#> Mazda RX4        6    4    4
#> Mazda RX4 Wag    6    4    4
#> Merc 280         6    4    4

# Compare nested vs serpentine sorting
# Nested: gear always ascending within cyl
mtcars |>
  arrange(cyl, gear) |>
  select(cyl, gear) |>
  head(12)
#>                cyl gear
#> Toyota Corona    4    3
#> Datsun 710       4    4
#> Merc 240D        4    4
#> Merc 230         4    4
#> Fiat 128         4    4
#> Honda Civic      4    4
#> Toyota Corolla   4    4
#> Fiat X1-9        4    4
#> Volvo 142E       4    4
#> Porsche 914-2    4    5
#> Lotus Europa     4    5
#> Hornet 4 Drive   6    3

# Serpentine: gear direction alternates by cyl group
mtcars |>
  arrange(serp(cyl, gear)) |>
  select(cyl, gear) |>
  head(12)
#>                cyl gear
#> Toyota Corona    4    3
#> Datsun 710       4    4
#> Merc 240D        4    4
#> Merc 230         4    4
#> Fiat 128         4    4
#> Honda Civic      4    4
#> Toyota Corolla   4    4
#> Fiat X1-9        4    4
#> Volvo 142E       4    4
#> Porsche 914-2    4    5
#> Lotus Europa     4    5
#> Ferrari Dino     6    5

# Implicit stratification with systematic sampling
# Sort BFA EAs in serpentine order, then draw systematic sample
sampling_design() |>
  draw(n = 100, method = "systematic") |>
  execute(arrange(bfa_eas, serp(region, province)),
                  seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights:      149.34 [149.34, 149.34]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00461 Boucle … Bale     Bana    Rural             1348        176     7.3 
#>  2 EA_11130 Boucle … Bale     Poura   Urban             1660        267    19.0 
#>  3 EA_12367 Boucle … Banwa    Solenzo Rural              984        132    23.2 
#>  4 EA_12925 Boucle … Banwa    Tansila Rural              824         99    18.1 
#>  5 EA_03758 Boucle … Kossi    Dokui   Rural             1277        158    25.8 
#>  6 EA_12543 Boucle … Kossi    Sono    Rural             1246        188    78.8 
#>  7 EA_03215 Boucle … Mouhoun  Dedoug… Rural             1320        234     0.88
#>  8 EA_12950 Boucle … Mouhoun  Tcheri… Rural             1188        190    22.3 
#>  9 EA_14040 Boucle … Nayala   Yaba    Rural             1351        178     6.95
#> 10 EA_07211 Boucle … Sourou   Lanfie… Rural              972        131    17.6 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Combine explicit stratification with serpentine sorting
# Stratify by urban/rural, use serpentine within strata
sampling_design() |>
  stratify_by(urban_rural) |>
  draw(n = 100, method = "systematic") |>
  execute(arrange(bfa_eas, urban_rural, serp(region, province)),
                  seed = 1234)
#> # A tbl_sample: 200 × 17
#> # Weights:      74.67 [26.56, 122.78]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00258 Boucle … Bale     Bagassi Rural             1744        210    30.2 
#>  2 EA_10360 Boucle … Bale     Ouri    Rural              909        116    56.5 
#>  3 EA_06883 Boucle … Banwa    Kouka   Rural             3778        520     3.12
#>  4 EA_12420 Boucle … Banwa    Solenzo Rural               79         11     5.68
#>  5 EA_00756 Boucle … Kossi    Barani  Rural             1121        147     9.28
#>  6 EA_03757 Boucle … Kossi    Dokui   Rural             1078        134    24.4 
#>  7 EA_08706 Boucle … Kossi    Nouna   Rural             1634        241     7.99
#>  8 EA_03162 Boucle … Mouhoun  Dedoug… Rural             1393        247    12.1 
#>  9 EA_10171 Boucle … Mouhoun  Ouarko… Rural             1044        143    35.8 
#> 10 EA_04538 Boucle … Nayala   Gassan  Rural             1900        210   124.  
#> # ℹ 190 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>