Extending sondage with Custom Methods

Overview

sondage ships with 16 built-in sampling methods, but researchers and agencies often need algorithms that are not included. The registration API lets you plug a custom unequal probability or balanced sampling method into the existing dispatchers and generics without modifying the package source.

After registering a method, it participates in the same dispatcher and sample-object API as built-in methods. Joint-probability and covariance queries are available when the registration supplies a joint_fn.

The registration API

A single call to register_method() adds a new method:

library(sondage)
register_method(
  name,                     # unique method name (character)
  type = "wor",             # "wor", "wr", or "balanced"
  sample_fn,                # function(pik/hits, n, ...) -> integer indices
  joint_fn = NULL,          # function(pik/hits, sample_idx, ...) -> matrix
  fixed_size = TRUE,        # required for with-replacement methods
  variance_family = NULL,   # declared variance-estimation family (optional)
  supports_prn = NULL,      # WOR/WR only: permanent random-number support?
  supports_aux = NULL,      # balanced only: balancing-variable support?
  supports_strata = NULL,   # balanced only: stratification support?
  supports_spread = NULL,   # balanced only: spatial-spreading support?
  probabilities = "unknown" # exact, approximate, or unknown (the default)
)

The callback contracts are:

A WOR method uses sample_fn(pik, n = NULL, prn = NULL, ...), where pik contains inclusion probabilities. It returns distinct selected unit indices (1-based).
A WR method uses sample_fn(hits, n = NULL, prn = NULL, ...), where hits contains expected hit counts. It returns n selected unit indices, with possible repeats.
A balanced method uses sample_fn(pik, n = NULL, aux = NULL, ...), where aux is the auxiliary balancing matrix passed to balanced_wor() (or NULL).
The optional joint callback follows the same naming distinction: joint_fn(pik, sample_idx = NULL, ...) returns joint inclusion probabilities for WOR and balanced methods, while joint_fn(hits, sample_idx = NULL, ...) returns joint expected hits for WR methods. It returns an \(N \times N\) matrix when sample_idx is NULL, or the corresponding submatrix otherwise.

The dispatcher validates that sample indices are finite integers in range and obey the replacement and sample-size rules. It also checks that joint results are finite numeric matrices with the expected dimensions and symmetry; statistical properties remain the method author’s responsibility.

Capability flags describe what the method can do, and the dispatchers enforce them. NULL means unspecified: it resolves to FALSE, except that supports_aux resolves to TRUE for a balanced method. WOR and WR methods may explicitly set only supports_prn; balanced methods may explicitly set only supports_aux, supports_strata, and supports_spread. Supplying a capability for another method type is an error, even when its value is FALSE.

A "wor" or "wr" method that sets supports_prn = TRUE receives a validated prn vector for sample coordination. Otherwise supplying prn is an error. A "balanced" method only receives a strata argument when registered with supports_strata = TRUE, and a spread argument (spatial coordinates) when registered with supports_spread = TRUE. Spread-only methods that never look at balancing variables should declare supports_aux = FALSE. In every case, passing a design input to a method without the matching capability is an error, because silently dropping prn, aux, strata, or spread would change the requested design. The simplest balanced registration therefore needs nothing beyond aux.

variance_family declares how design-based variance should be estimated for samples drawn with the method: "srs", "pps_brewer", "poisson", "wr", or "unsupported". sondage does not use it itself, but packages that export samples for variance estimation do. Without a declaration they must infer a treatment from type and fixed_size, and for a random-size WOR method no safe inference exists because a Poisson-type method (independent selections) and a correlated random-size scheme need different estimators. The declaration is an assertion the author is responsible for; ?register_method lists the constraints between variance_family, type, and fixed_size, and Example 1 shows how to check a declaration by simulation. When in doubt, declare "unsupported" rather than guess.

probabilities places the method in the first-order probability taxonomy. Use "exact" when the true first-order inclusion probabilities equal the pik passed to a WOR callback, or the expected hits equal the hits passed to a WR callback, as they do for Sampford and the cube method. Use "approximate" when the method honors its input to a documented approximation, as Pareto and sequential Poisson order sampling do. Use "unknown", the default, when the input only steers the selection. Successive sampling with sample.int(prob = pik) belongs to this last tier because its true inclusion probabilities differ from pik. The same draw with replacement does honor expected hits, so a multinomial-style "wr" method declares "exact".

The tier matters to packages that use 1/pik as design weights. For an "unknown" method, those weights are systematically biased rather than merely noisy. A downstream package may therefore read the tier through method_spec() and refuse the method for weighted estimation instead of asserting probabilities the design never had. The default is deliberately strict. If you have not established which tier your method is in, its selection probabilities are unknown. Sampling through sondage itself is unaffected because the declaration describes the method and never disables it. As with variance_family, the method author is responsible for the declaration. The simulation below shows how to check it.

Helper functions registered_methods(), is_registered_method(), and unregister_method() manage the registry.

Example 1: Randomized pivotal sampling

The pivotal method of Deville and Tillé (1998) repeatedly combines two fractional inclusion probabilities and resolves at least one of them to zero or one. Randomizing the pair at each step gives a compact example of an exact fixed-size \(\pi\)ps sampler that is not already a built-in sondage method.

random_pivotal_sample <- function(pik, n = NULL, prn = NULL, ...) {
  p <- as.double(pik)
  tol <- 64 * .Machine$double.eps
  active <- which(p > tol & p < 1 - tol)

  while (length(active) >= 2L) {
    ij <- sample(active, 2L)
    i <- ij[1L]
    j <- ij[2L]
    total <- p[i] + p[j]

    if (total < 1) {
      if (runif(1) < p[i] / total) p[c(i, j)] <- c(total, 0)
      else p[c(i, j)] <- c(0, total)
    } else {
      if (runif(1) < (1 - p[j]) / (2 - total)) {
        p[c(i, j)] <- c(1, total - 1)
      } else {
        p[c(i, j)] <- c(total - 1, 1)
      }
    }
    active <- which(p > tol & p < 1 - tol)
  }
  sort(which(p > 0.5))
}

Joint probabilities via `he_jip()`

Closed-form joint probabilities are not generally available for randomized pivotal sampling. To demonstrate the optional joint_fn hook, we attach the high-entropy approximation of Brewer and Donadio (2003). This makes the standard variance generics available, but it remains an approximation whose suitability should be checked for the intended population. For designs closer to conditional Poisson sampling, hajek_jip() is another exported approximation.

library(sondage)

register_method(
  "random_pivotal",
  type            = "wor",
  sample_fn       = random_pivotal_sample,
  joint_fn        = he_jip,
  fixed_size      = TRUE,
  variance_family = "pps_brewer",
  probabilities   = "exact"
)

The method is now available through the standard dispatcher, and its declared joint approximation flows through joint_inclusion_prob() and sampling_cov():

pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)
s <- unequal_prob_wor(pik, method = "random_pivotal")
s
#> Unequal prob WOR [random_pivotal] (n=4, N=8): 3 6 7 8

pikl <- joint_inclusion_prob(s)
round(pikl, 4)
#>        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
#> [1,] 0.1818 0.0350 0.0480 0.0618 0.0765 0.0925 0.1098 0.1289
#> [2,] 0.0350 0.2727 0.0737 0.0948 0.1174 0.1417 0.1682 0.1972
#> [3,] 0.0480 0.0737 0.3636 0.1296 0.1604 0.1935 0.2294 0.2688
#> [4,] 0.0618 0.0948 0.1296 0.4545 0.2059 0.2482 0.2939 0.3440
#> [5,] 0.0765 0.1174 0.1604 0.2059 0.5455 0.3062 0.3624 0.4237
#> [6,] 0.0925 0.1417 0.1935 0.2482 0.3062 0.6364 0.4355 0.5087
#> [7,] 0.1098 0.1682 0.2294 0.2939 0.3624 0.4355 0.7273 0.5999
#> [8,] 0.1289 0.1972 0.2688 0.3440 0.4237 0.5087 0.5999 0.8182
round(sampling_cov(s, weighted = TRUE), 4)
#>         [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]
#> [1,]  0.8182 -0.4165 -0.3784 -0.3383 -0.2960 -0.2515 -0.2043 -0.1544
#> [2,] -0.4165  0.7273 -0.3457 -0.3075 -0.2671 -0.2245 -0.1793 -0.1314
#> [3,] -0.3784 -0.3457  0.6364 -0.2750 -0.2366 -0.1959 -0.1528 -0.1070
#> [4,] -0.3383 -0.3075 -0.2750  0.5455 -0.2042 -0.1656 -0.1246 -0.0810
#> [5,] -0.2960 -0.2671 -0.2366 -0.2042  0.4545 -0.1334 -0.0946 -0.0532
#> [6,] -0.2515 -0.2245 -0.1959 -0.1656 -0.1334  0.3636 -0.0626 -0.0236
#> [7,] -0.2043 -0.1793 -0.1528 -0.1246 -0.0946 -0.0626  0.2727  0.0081
#> [8,] -0.1544 -0.1314 -0.1070 -0.0810 -0.0532 -0.0236  0.0081  0.1818
joint_inclusion_prob(s, sampled_only = TRUE)
#>           3         6         7         8
#> 3 0.3636364 0.1934985 0.2294118 0.2687666
#> 6 0.1934985 0.6363636 0.4355263 0.5086542
#> 7 0.2294118 0.4355263 0.7272727 0.5999281
#> 8 0.2687666 0.5086542 0.5999281 0.8181818

Verifying the registration

Simulation verifies the exact first-order contract and shows how closely the chosen joint approximation follows this particular randomized pairing rule.

sim <- unequal_prob_wor(pik, method = "random_pivotal", nrep = 5000)
freq <- tabulate(sim$sample, nbins = length(pik)) / 5000
cbind(target = pik, empirical = freq)
#>         target empirical
#> [1,] 0.1818182    0.1772
#> [2,] 0.2727273    0.2726
#> [3,] 0.3636364    0.3482
#> [4,] 0.4545455    0.4612
#> [5,] 0.5454545    0.5512
#> [6,] 0.6363636    0.6398
#> [7,] 0.7272727    0.7310
#> [8,] 0.8181818    0.8188

N <- length(pik)
co_occur <- matrix(0, N, N)
for (j in seq_len(5000)) {
  selected <- sim$sample[, j]
  co_occur[selected, selected] <- co_occur[selected, selected] + 1
}
empirical_jip <- co_occur / 5000
he_pikl <- he_jip(pik)

pairs <- data.frame(
  i = c(1, 2, 3, 5),
  j = c(8, 7, 6, 8)
)
pairs$HE <- round(he_pikl[cbind(pairs$i, pairs$j)], 4)
pairs$empirical <- round(empirical_jip[cbind(pairs$i, pairs$j)], 4)
pairs
#>   i j     HE empirical
#> 1 1 8 0.1289    0.1214
#> 2 2 7 0.1682    0.1678
#> 3 3 6 0.1935    0.1748
#> 4 5 8 0.4237    0.4280

The first-order frequencies should agree within Monte Carlo error. Differences in the pairwise columns measure approximation error, not failure of the pivotal sampler. If that error is unacceptable, omit joint_fn and declare variance_family = "unsupported", or supply a design-specific joint or variance estimator.

Example 2: Wrapping an external package (sampling::UPtille)

When an algorithm is already implemented in another package, the wrapper is minimal. Here we wrap UPtille and UPtillepi2 from the sampling package (Tillé 2006).

tille_sample <- function(pik, n = NULL, prn = NULL, ...) {
  which(as.logical(sampling::UPtille(pik)))
}

tille_joint <- function(pik, sample_idx = NULL, ...) {
  pikl <- sampling::UPtillepi2(pik)
  if (!is.null(sample_idx)) {
    pikl <- pikl[sample_idx, sample_idx, drop = FALSE]
  }
  pikl
}

register_method(
  "tille",
  type            = "wor",
  sample_fn       = tille_sample,
  joint_fn        = tille_joint,
  fixed_size      = TRUE,
  variance_family = "pps_brewer",
  probabilities   = "exact"
)

pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)
s <- unequal_prob_wor(pik, method = "tille")
s
#> Unequal prob WOR [tille] (n=4, N=8): 3 6 7 8

# Exact joint inclusion probabilities from UPtillepi2
round(joint_inclusion_prob(s), 4)
#>        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
#> [1,] 0.1818 0.0120 0.0361 0.0585 0.0791 0.0996 0.1199 0.1403
#> [2,] 0.0120 0.2727 0.0602 0.0877 0.1187 0.1494 0.1798 0.2104
#> [3,] 0.0361 0.0602 0.3636 0.1169 0.1582 0.1992 0.2397 0.2805
#> [4,] 0.0585 0.0877 0.1169 0.4545 0.2012 0.2490 0.2997 0.3506
#> [5,] 0.0791 0.1187 0.1582 0.2012 0.5455 0.2988 0.3596 0.4208
#> [6,] 0.0996 0.1494 0.1992 0.2490 0.2988 0.6364 0.4221 0.4909
#> [7,] 0.1199 0.1798 0.2397 0.2997 0.3596 0.4221 0.7273 0.5610
#> [8,] 0.1403 0.2104 0.2805 0.3506 0.4208 0.4909 0.5610 0.8182

# Full variance estimation chain
round(sampling_cov(s), 4)
#>         [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]
#> [1,]  0.1488 -0.0375 -0.0300 -0.0242 -0.0201 -0.0161 -0.0124 -0.0085
#> [2,] -0.0375  0.1983 -0.0390 -0.0363 -0.0301 -0.0241 -0.0185 -0.0128
#> [3,] -0.0300 -0.0390  0.2314 -0.0484 -0.0401 -0.0322 -0.0247 -0.0170
#> [4,] -0.0242 -0.0363 -0.0484  0.2479 -0.0467 -0.0402 -0.0309 -0.0213
#> [5,] -0.0201 -0.0301 -0.0401 -0.0467  0.2479 -0.0483 -0.0371 -0.0255
#> [6,] -0.0161 -0.0241 -0.0322 -0.0402 -0.0483  0.2314 -0.0407 -0.0298
#> [7,] -0.0124 -0.0185 -0.0247 -0.0309 -0.0371 -0.0407  0.1983 -0.0340
#> [8,] -0.0085 -0.0128 -0.0170 -0.0213 -0.0255 -0.0298 -0.0340  0.1488

Example 3: A custom balanced method

Balanced methods register with type = "balanced" and dispatch through balanced_wor(). The minimal contract only involves aux, so wrapping an aux-only algorithm is a one-liner. Here we wrap the cube implementation from the sampling package, whose landing phase uses linear programming and can therefore give different samples than the built-in "cube" method.

Note two conventions. The wrapper receives aux exactly as the caller supplied it (validated, but without the sample-size constraint prepended), so we cbind(pik, aux) ourselves because samplecube() expects the size constraint as a balancing column. And since the cube method produces a high-entropy design, we can pass he_jip as the joint_fn, which is exactly what the built-in method uses.

cube_lp_sample <- function(pik, n = NULL, aux = NULL, ...) {
  X <- cbind(pik, aux)
  which(sampling::samplecube(X, pik, comment = FALSE) == 1)
}

register_method(
  "cube_lp",
  type            = "balanced",
  sample_fn       = cube_lp_sample,
  joint_fn        = he_jip,
  variance_family = "pps_brewer",
  probabilities   = "exact"
)

pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)
x <- matrix(c(10, 20, 15, 25, 30, 35, 40, 45))
s <- balanced_wor(pik, aux = x, method = "cube_lp")
s
#> Balanced WOR [cube_lp] (n=4, N=8): 4 5 6 8

# Balancing check: HT estimate of the aux total vs the true total
colSums(x[s$sample, , drop = FALSE] / pik[s$sample]) - colSums(x)
#> [1] 0

To support stratification as well, register with supports_strata = TRUE and add a strata argument to the sampler. The dispatcher passes strata as dense integer labels 1:H (only when the caller supplies them), and demotes fixed_size with a warning when per-stratum sum(pik) is not close to an integer, mirroring the built-in method.

cube_lp_stratified <- function(pik, n = NULL, aux = NULL, strata = NULL, ...) {
  if (is.null(strata)) {
    return(cube_lp_sample(pik, n = n, aux = aux))
  }
  X <- if (is.null(aux)) matrix(pik, ncol = 1) else cbind(pik, aux)
  which(sampling::balancedstratification(X, strata, pik, comment = FALSE) == 1)
}

register_method(
  "cube_lp_str",
  type            = "balanced",
  sample_fn       = cube_lp_stratified,
  joint_fn        = he_jip,
  variance_family = "pps_brewer",
  supports_strata = TRUE,
  probabilities   = "exact"
)

pik <- rep(0.5, 8)
strata <- rep(1:2, each = 4)
s <- balanced_wor(pik, aux = matrix(as.double(1:8)), strata = strata,
                  method = "cube_lp_str")

# Within-stratum sample sizes are preserved
tabulate(strata[s$sample], nbins = 2)
#> [1] 2 2

Example 4: A spatial method with `spread`

Spatially balanced (well-spread) designs select units that are far apart in space, which improves precision whenever the study variable is spatially structured. Registered balanced methods opt into spatial spreading with supports_spread = TRUE, and then receive the coordinate matrix that the caller passes to balanced_wor(spread = ).

sondage ships LPM2 and SCPS as the built-in spread-only methods balanced_wor(pik, spread = , method = "lpm2") and balanced_wor(pik, spread = , method = "scps"). Here we register its sibling LPM1 (Grafström, Lundström, and Schelin 2012), which differs only in the pair rule: a random undecided unit competes with its nearest undecided neighbour only when the two are mutual nearest neighbours, giving slightly better spread at extra cost. The in-house SCPS core uses the maximal-weight rule of Grafström (2012) with a weighted quickselect distance cutoff. Methods not shipped in sondage, such as local cube, can still be supplied through the same registration contract.

lpm1_sample <- function(pik, n = NULL, aux = NULL, spread = NULL, ...) {
  d <- as.matrix(dist(spread))
  diag(d) <- Inf
  p <- pik
  eps <- 1e-9
  repeat {
    u <- which(p > eps & p < 1 - eps)
    if (length(u) == 0L) {
      break
    }
    if (length(u) == 1L) {
      p[u] <- as.numeric(runif(1) < p[u])
      break
    }
    i <- u[sample.int(length(u), 1L)]
    v <- u[u != i]
    j <- v[which.min(d[i, v])]
    w <- u[u != j]
    if (w[which.min(d[j, w])] != i) {
      next # not mutual nearest neighbours: redraw i
    }
    s <- p[i] + p[j]
    if (s > 1) {
      if (runif(1) < (1 - p[j]) / (2 - s)) {
        p[i] <- 1
        p[j] <- s - 1
      } else {
        p[j] <- 1
        p[i] <- s - 1
      }
    } else {
      if (runif(1) < p[j] / s) {
        p[j] <- s
        p[i] <- 0
      } else {
        p[i] <- s
        p[j] <- 0
      }
    }
  }
  which(p > 1 - eps)
}

register_method(
  "lpm1",
  type             = "balanced",
  sample_fn        = lpm1_sample,
  variance_family  = "unsupported",
  supports_aux     = FALSE,
  supports_spread  = TRUE,
  probabilities    = "exact"
)

Since the local pivotal method and SCPS spread but never exactly balance on auxiliary totals, a custom method in this family is registered with supports_aux = FALSE. A caller who passes aux then gets an immediate error instead of a sample that silently ignored the requested balancing constraints.

set.seed(42)
N <- 200
coords <- cbind(runif(N), runif(N))
pik <- rep(0.15, N)

s <- balanced_wor(pik, spread = coords, method = "lpm1")
s
#> Balanced WOR [lpm1] (n=30, N=200): 1 11 15 17 33 41 49 57 63 69 ...

# Spread diagnostic: mean distance to the nearest sampled neighbour
# (larger is better spread)
nn_dist <- function(idx) {
  d <- as.matrix(dist(coords[idx, ]))
  diag(d) <- Inf
  mean(apply(d, 1, min))
}
s2 <- balanced_wor(pik, spread = coords, method = "lpm2")
s3 <- balanced_wor(pik, spread = coords, method = "scps")
c(
  lpm1 = nn_dist(s$sample),
  lpm2 = nn_dist(s2$sample),
  scps = nn_dist(s3$sample),
  srs = nn_dist(sample.int(N, s$n))
)
#>       lpm1       lpm2       scps        srs 
#> 0.12008807 0.13838790 0.12915099 0.09696927

A note on variance estimation. Well-spread designs have no tractable joint inclusion probabilities (the high-entropy approximation does not apply, since spreading deliberately drives nearby joint probabilities toward zero), so joint_fn is usually left as NULL and joint_inclusion_prob() / sampling_cov() will error for these methods. In practice, variance for spatially balanced samples is estimated with local-neighbourhood estimators such as the local mean estimator of Grafström and Schelin (2014), which sondage does not implement yet.

Writing a `joint_fn`

A joint_fn is optional but enables joint_inclusion_prob(), joint_expected_hits(), and sampling_cov(). Five common strategies:

Exact formula. When the design has a known closed form (e.g., UPtillepi2 above).
he_jip(). The high-entropy approximation (Brewer and Donadio 2003). Best for maximum-entropy and high-entropy designs (Sampford, Tillé, and most \(\pi\)ps procedures). Uses optimised C code internally. Pass it directly: joint_fn = he_jip.
hajek_jip(). The Hajek (1964) approximation based on conditional Poisson (rejective) sampling theory. Simpler formula and slightly cheaper, but generally a bit less accurate than he_jip(). Best when the design is obtained by conditioning independent Poisson trials on the sample size. Pass it directly: joint_fn = hajek_jip.
Other approximations. Other methods exist in the literature and can be implemented as custom joint_fn functions. Tillé (1996) reviews several alternatives, including approximations based on Poisson, rejective, and successive sampling theory, each with different accuracy–computation trade-offs. Any function that accepts (pik, sample_idx = NULL, ...) and returns a symmetric matrix is a valid joint_fn.
Monte Carlo estimation. Resample \(B\) times and estimate \(\hat{\pi}_{ij} = B^{-1} \sum_{b=1}^B I(i \in S_b)\, I(j \in S_b)\). Slow but universal:

mc_joint <- function(pik, sample_idx = NULL, ..., B = 5000) {
  N <- length(pik)
  n <- as.integer(round(sum(pik)))
  co <- matrix(0, N, N)
  for (b in seq_len(B)) {
    s <- my_sampler(pik, n = n)
    co[s, s] <- co[s, s] + 1
  }
  pikl <- co / B
  diag(pikl) <- tabulate(unlist(
    replicate(B, my_sampler(pik, n = n), simplify = FALSE)
  ), nbins = N) / B
  if (!is.null(sample_idx)) {
    pikl <- pikl[sample_idx, sample_idx, drop = FALSE]
  }
  pikl
}

The sample_idx argument enables the sampled_only = TRUE path in joint_inclusion_prob(). When sample_idx is non-NULL, you may either (a) compute the full matrix and subset as above, or (b) skip rows/columns not in sample_idx for efficiency.

Session persistence

The registry lives in the package namespace and resets when sondage is reloaded. To make a registration persistent across sessions, place the register_method() call in your .Rprofile or in a project-level setup script.

References

Brewer, K. R. W., and M. E. Donadio. 2003. “The High Entropy Variance of the Horvitz–Thompson Estimator.” Survey Methodology 29 (2): 189–96.

Deville, Jean-Claude, and Yves Tillé. 1998. “Unequal Probability Sampling Without Replacement Through a Splitting Method.” Biometrika 85 (1): 89–101. https://doi.org/10.1093/biomet/85.1.89.

Grafström, Anton. 2012. “Spatially Correlated Poisson Sampling.” Journal of Statistical Planning and Inference 142 (1): 139–47. https://doi.org/10.1016/j.jspi.2011.07.003.

Grafström, Anton, Niklas L. P. Lundström, and Lina Schelin. 2012. “Spatially Balanced Sampling Through the Pivotal Method.” Biometrics 68 (2): 514–20. https://doi.org/10.1111/j.1541-0420.2011.01699.x.

Grafström, Anton, and Lina Schelin. 2014. “How to Select Representative Samples.” Scandinavian Journal of Statistics 41 (2): 277–90. https://doi.org/10.1111/sjos.12016.

Hájek, Jaroslav. 1964. “Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population.” The Annals of Mathematical Statistics 35 (4): 1491–1523. https://doi.org/10.1214/aoms/1177700375.

Tillé, Yves. 1996. “Some Remarks on Unequal Probability Sampling Designs Without Replacement.” Annales d’Économie Et de Statistique, no. 44: 177–89. https://doi.org/10.2307/20076043.

———. 2006. Sampling Algorithms. Springer Series in Statistics. Springer. https://doi.org/10.1007/0-387-34240-0.