Computes the matrix of joint inclusion probabilities \(\pi_{ij} = P(i \in S \text{ and } j \in S)\) for a without-replacement sampling design.
Usage
joint_inclusion_prob(x, ...)
# S3 method for class 'wor'
joint_inclusion_prob(x, sampled_only = FALSE, eps = 1e-06, ...)
# S3 method for class 'wr'
joint_inclusion_prob(x, ...)
# Default S3 method
joint_inclusion_prob(x, ...)Arguments
- x
A without-replacement design object (class
"wor").- ...
Additional arguments passed to methods (e.g.,
epsfor boundary tolerance).- sampled_only
If
TRUE, return only the n x n submatrix for the sampled units (requiresnrep = 1). Useful when N is large but n is manageable. DefaultFALSE.- eps
Boundary tolerance for CPS and systematic methods (default 1e-6).
Value
A symmetric N x N matrix (or n x n if sampled_only = TRUE)
of joint inclusion probabilities. Diagonal entries are the
first-order inclusion probabilities \(\pi_i\).
Details
The computation depends on the sampling method stored in the design object. Not all methods yield exact joint probabilities:
- Exact (up to numerical precision)
cps(from the CPS design matrix via Aires' formula and elementary symmetric polynomials),systematic(combinatorial enumeration via C code),poisson(\(\pi_{ij} = \pi_i \pi_j\), independent selections),srs, andbernoulli.- Approximation
brewer,sps,pareto, andcubeuse the high-entropy approximation (Brewer & Donadio, 2003, eq. 18): \(\pi_{ij} \approx \pi_i \pi_j (c_i + c_j) / 2\) where \(c_k = (n-1) / (n - (2n-1)\pi_k/(n-1) + \sum_l \pi_l^2/(n-1))\). This approximation guarantees symmetry, \(0 \leq \pi_{ij} \leq \min(\pi_i, \pi_j)\), and correct diagonal (\(\pi_{ii} = \pi_i\)), but does not guarantee the fixed-size marginal identity \(\sum_{j \neq i} \pi_{ij} = (n-1)\pi_i\). The marginal defect is typically small for well-spread inclusion probabilities but can become non-trivial for highly skewed \(\pi_k\) vectors, especially when \(n\) is small (e.g. \(n = 2\)). A warning is issued when the defect exceeds 5\ numerical precision) second-order inclusion probabilities are required.
For systematic PPS sampling, some off-diagonal entries may be
exactly zero (pairs of units that can never co-occur in the same
systematic sample). This has consequences for variance estimation;
see sampling_cov().
When sampled_only = TRUE, only the n x n submatrix for sampled
units is returned. For methods using the high-entropy approximation
(brewer, sps, pareto, cube), the \(c_k\) coefficients are
computed from the full population \(\pi_k\) vector (O(N)), but only
the sampled pairs are assembled into a matrix (O(n^2)), avoiding the
O(N^2) allocation entirely. Similarly, poisson, bernoulli, and
srs compute the n x n matrix directly. These methods can therefore
handle arbitrarily large N (e.g. N = 50 000 with n = 200).
For systematic, the C code computes the interval structure from
the full population (O(N log N)) but builds the indicator matrix
only for sampled units (O(n x N) instead of O(N^2)) and the pair
loop is O(n^2 x N) instead of O(N^3), so large N is feasible.
For cps, the Newton calibration and elementary symmetric
polynomials are computed from the full population (O(N x n)),
but only the sampled pairs are evaluated in the pair loop
(O(n^2) with Aires' formula), so large N is feasible.
The marginal defect diagnostic is skipped when sampled_only = TRUE
because the row-sum identity only holds for the full matrix.
See also
joint_expected_hits() for the with-replacement analogue,
sampling_cov() for the covariance matrix.
Examples
pik <- c(0.2, 0.3, 0.5)
s <- unequal_prob_wor(pik, method = "cps")
joint_inclusion_prob(s)
#> [,1] [,2] [,3]
#> [1,] 0.2 0.0 0.0
#> [2,] 0.0 0.3 0.0
#> [3,] 0.0 0.0 0.5
# Only the n x n submatrix for sampled units
joint_inclusion_prob(s, sampled_only = TRUE)
#> [,1]
#> [1,] 0.3