Skip to contents

Draws a sample with unequal inclusion probabilities, without replacement.

Usage

unequal_prob_wor(
  pik,
  method = c("cps", "brewer", "systematic", "poisson", "sps", "pareto"),
  nrep = 1L,
  prn = NULL,
  ...
)

Arguments

pik

A numeric vector of inclusion probabilities. For fixed-size methods, sum(pik) must be close to an integer.

method

The sampling method:

"cps"

Conditional Poisson Sampling (maximum entropy design; Chen, Dempster & Liu, 1994). Fixed sample size, exact first-order inclusion probabilities. Joint inclusion probabilities are exact (computed from the CPS design matrix); all \(\pi_{ij} > 0\). Complexity: O(N^2).

"brewer"

Brewer's (1963) draw-by-draw method. Fixed sample size, exact first-order inclusion probabilities. Joint inclusion probabilities are approximated via the high-entropy approximation (see joint_inclusion_prob()). Complexity: O(Nn).

"systematic"

Systematic PPS sampling. Fixed sample size, exact first-order inclusion probabilities. Joint inclusion probabilities are exact but some may be zero (pairs of units that can never co-occur in the same systematic sample). This makes the Sen-Yates-Grundy variance estimator inapplicable; see sampling_cov(). Complexity: O(N).

"poisson"

Poisson sampling. Random sample size (expected size \(n = \sum \pi_k\)). Each unit is selected independently with probability \(\pi_k\), so joint inclusion probabilities are exact: \(\pi_{ij} = \pi_i \pi_j\). Supports PRN for sample coordination. Complexity: O(N). Note: the realized sample size varies across draws and may occasionally be zero, particularly when inclusion probabilities are small.

"sps"

Sequential Poisson Sampling (Ohlsson, 1998). Implemented as order sampling (Rosen, 1997) with ranking key \(\xi_k = u_k / \pi_k\): the \(n\) units with the smallest \(\xi_k\) are selected. This is equivalent to Ohlsson's sequential threshold adjustment. Fixed sample size, high-entropy design. Supports PRN for sample coordination. First-order inclusion probabilities are approximately (not exactly) equal to the target pik for finite populations; see inclusion_prob(). Joint inclusion probabilities are approximated via the high-entropy approximation. Complexity: O(N log N).

"pareto"

Pareto sampling (Rosen, 1997). Order sampling with odds-ratio ranking key \(\xi_k = [u_k/(1-u_k)] / [\pi_k/(1-\pi_k)]\). Fixed sample size, high-entropy design. Supports PRN for sample coordination. First-order inclusion probabilities are approximately (not exactly) equal to the target pik for finite populations; see inclusion_prob(). Joint inclusion probabilities are approximated via the high-entropy approximation. Complexity: O(N log N).

nrep

Number of replicate samples (default 1). When nrep > 1, $sample holds a matrix (fixed-size) or list (random-size) of all replicates. The design object and all generics remain usable.

prn

Optional vector of permanent random numbers (length N, values in the open interval (0, 1)) for sample coordination. Supported by methods "sps", "pareto", and "poisson". When NULL, random numbers are generated internally. Cannot be used with nrep > 1 (identical PRN would produce identical replicates). Use a loop with different PRN vectors for coordinated repeated sampling.

...

Additional arguments passed to methods (e.g., eps for boundary tolerance).

Value

An object of class c("unequal_prob", "wor", "sondage_sample"). When nrep = 1, $sample is an integer vector of selected unit indices. When nrep > 1, $sample is a matrix (n x nrep) for fixed-size methods, or a list of integer vectors of varying lengths for random-size methods ("poisson").

References

Chen, S. X., Dempster, A. P., & Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.

Brewer, K.R.W. (1963). A model of systematic sampling with unequal probabilities. Australian Journal of Statistics, 5, 5-13.

Ohlsson, E. (1998). Sequential Poisson sampling. Journal of Official Statistics, 14(2), 149-162.

Rosen, B. (1997). On sampling with probability proportional to size. Journal of Statistical Planning and Inference, 62(2), 159-191.

Tille, Y. (2006). Sampling Algorithms. Springer.

See also

unequal_prob_wr() for with-replacement designs, equal_prob_wor() for equal probability designs, inclusion_prob() to compute inclusion probabilities from size measures.

Examples

pik <- c(0.2, 0.4, 0.6, 0.8)

# Conditional Poisson Sampling
set.seed(123)
s <- unequal_prob_wor(pik, method = "cps")
s$sample
#> [1] 3 4

# Brewer's method
s <- unequal_prob_wor(pik, method = "brewer")
s$sample
#> [1] 3 4

# Sequential Poisson Sampling with PRN coordination
prn <- runif(4)
s <- unequal_prob_wor(pik, method = "sps", prn = prn)
s$sample
#> [1] 2 3

# Pareto sampling
s <- unequal_prob_wor(pik, method = "pareto", prn = prn)
s$sample
#> [1] 2 3

# \donttest{
# Batch mode for simulations
sim <- unequal_prob_wor(pik, method = "cps", nrep = 1000)
dim(sim$sample)  # 2 x 1000
#> [1]    2 1000
# }