Skip to contents

Computes the matrix of joint inclusion probabilities \(\pi_{ij} = P(i \in S \text{ and } j \in S)\) for a without-replacement sampling design.

Usage

joint_inclusion_prob(x, ...)

# S3 method for class 'wor'
joint_inclusion_prob(x, sampled_only = FALSE, eps = 1e-06, ...)

# S3 method for class 'wr'
joint_inclusion_prob(x, ...)

# Default S3 method
joint_inclusion_prob(x, ...)

Arguments

x

A without-replacement design object (class "wor").

...

Additional arguments passed to methods (e.g., eps for boundary tolerance).

sampled_only

If TRUE, return only the n x n submatrix for the sampled units (requires nrep = 1). Useful when N is large but n is manageable. Default FALSE.

eps

Boundary tolerance for CPS and systematic methods (default 1e-6).

Value

A symmetric N x N matrix (or n x n if sampled_only = TRUE) of joint inclusion probabilities. Diagonal entries are the first-order inclusion probabilities \(\pi_i\).

Details

The computation depends on the sampling method stored in the design object. Not all methods yield exact joint probabilities:

Exact (up to numerical precision)

cps (from the CPS design matrix via Aires' formula and elementary symmetric polynomials), systematic (combinatorial enumeration via C code), poisson (\(\pi_{ij} = \pi_i \pi_j\), independent selections), srs, and bernoulli.

Approximation

brewer, sps, pareto, and cube use the high-entropy approximation (Brewer & Donadio, 2003, eq. 18): \(\pi_{ij} \approx \pi_i \pi_j (c_i + c_j) / 2\) where \(c_k = (n-1) / (n - (2n-1)\pi_k/(n-1) + \sum_l \pi_l^2/(n-1))\). This approximation guarantees symmetry, \(0 \leq \pi_{ij} \leq \min(\pi_i, \pi_j)\), and correct diagonal (\(\pi_{ii} = \pi_i\)), but does not guarantee the fixed-size marginal identity \(\sum_{j \neq i} \pi_{ij} = (n-1)\pi_i\). The marginal defect is typically small for well-spread inclusion probabilities but can become non-trivial for highly skewed \(\pi_k\) vectors, especially when \(n\) is small (e.g. \(n = 2\)). A warning is issued when the defect exceeds 5\ numerical precision) second-order inclusion probabilities are required.

For systematic PPS sampling, some off-diagonal entries may be exactly zero (pairs of units that can never co-occur in the same systematic sample). This has consequences for variance estimation; see sampling_cov().

When sampled_only = TRUE, only the n x n submatrix for sampled units is returned. For methods using the high-entropy approximation (brewer, sps, pareto, cube), the \(c_k\) coefficients are computed from the full population \(\pi_k\) vector (O(N)), but only the sampled pairs are assembled into a matrix (O(n^2)), avoiding the O(N^2) allocation entirely. Similarly, poisson, bernoulli, and srs compute the n x n matrix directly. These methods can therefore handle arbitrarily large N (e.g. N = 50 000 with n = 200).

For systematic, the C code computes the interval structure from the full population (O(N log N)) but builds the indicator matrix only for sampled units (O(n x N) instead of O(N^2)) and the pair loop is O(n^2 x N) instead of O(N^3), so large N is feasible.

For cps, the Newton calibration and elementary symmetric polynomials are computed from the full population (O(N x n)), but only the sampled pairs are evaluated in the pair loop (O(n^2) with Aires' formula), so large N is feasible.

The marginal defect diagnostic is skipped when sampled_only = TRUE because the row-sum identity only holds for the full matrix.

See also

joint_expected_hits() for the with-replacement analogue, sampling_cov() for the covariance matrix.

Examples

pik <- c(0.2, 0.3, 0.5)
s <- unequal_prob_wor(pik, method = "cps")
joint_inclusion_prob(s)
#>      [,1] [,2] [,3]
#> [1,]  0.2  0.0  0.0
#> [2,]  0.0  0.3  0.0
#> [3,]  0.0  0.0  0.5

# Only the n x n submatrix for sampled units
joint_inclusion_prob(s, sampled_only = TRUE)
#>      [,1]
#> [1,]  0.3