Joint Inclusion Probabilities

Computes the matrix of joint inclusion probabilities \(\pi_{ij} = P(i \in S \text{ and } j \in S)\) for a without-replacement sampling design.

Usage

joint_inclusion_prob(x, ...)

# S3 method for class 'wor'
joint_inclusion_prob(x, sampled_only = FALSE, eps = 1e-06, ...)

# S3 method for class 'wr'
joint_inclusion_prob(x, ...)

# Default S3 method
joint_inclusion_prob(x, ...)

Arguments

x: A without-replacement design object (class "wor").
...: Additional arguments passed to the joint_fn of methods registered via register_method().
sampled_only: If TRUE, return only the n x n submatrix for the sampled units (requires nrep = 1). Useful when N is large but n is manageable. Default FALSE.
eps: High-entropy approximation boundary tolerance (default 1e-6), used only by the HE-approximate methods (brewer, sps, pareto, cube); see he_jip(). The exact cps and systematic methods ignore it and treat only exact 0/1 as excluded/certainty units.

Value

A symmetric N x N matrix (or n x n if sampled_only = TRUE) of joint inclusion probabilities. Diagonal entries are the first-order inclusion probabilities \(\pi_i\). When the stored probability vector is named, those names are used as row and column names. An otherwise unnamed sampled-only matrix is labelled with the sampled population indices.

Details

The computation depends on the method stored in the design object:

Exact: cps (probability-domain Poisson-binomial recurrences), sampford (elementary-symmetric recurrences), systematic (circular interval overlap), poisson (\(\pi_{ij} = \pi_i \pi_j\)), srs, and bernoulli.
Approximate: brewer, sps, pareto, and cube use the high-entropy approximation (Brewer & Donadio, 2003, eq. 18): \(\pi_{ij} \approx \pi_i \pi_j (c_i + c_j) / 2\). This guarantees symmetry, \(0 \leq \pi_{ij} \leq \min(\pi_i, \pi_j)\), and correct diagonal, but not the marginal identity \(\sum_{j \neq i} \pi_{ij} = (n-1)\pi_i\). The defect is typically small but grows with skewed \(\pi_k\) and small \(n\). A warning is issued when it exceeds 5\ Use method = "cps" when exact second-order probabilities are needed. For sps and pareto, the approximation uses the stored target pik, not the exact finite-population probabilities.

For systematic PPS, some off-diagonal entries may be exactly zero (pairs that never co-occur). See sampling_cov().

When sampled_only = TRUE, only the n x n submatrix for sampled units is returned. All methods compute this directly without allocating the full N x N matrix, so large N is feasible (e.g. N = 50 000 with n = 200). The marginal defect diagnostic is skipped because the row-sum identity only holds for the full matrix.

For cps, the fixed-point calibration used to produce the joint probabilities can emit a "CPS calibration did not reach tolerance" warning for pik values very close to 0 or 1. The realized joint probabilities differ from their exact values by up to the reported max_diff; see unequal_prob_wor() for context and remediation.

Examples

pik <- c(0.2, 0.3, 0.5)
s <- unequal_prob_wor(pik, method = "cps")
joint_inclusion_prob(s)
#>      [,1] [,2] [,3]
#> [1,]  0.2  0.0  0.0
#> [2,]  0.0  0.3  0.0
#> [3,]  0.0  0.0  0.5

# Only the n x n submatrix for sampled units
joint_inclusion_prob(s, sampled_only = TRUE)
#>     3
#> 3 0.5

Usage

Arguments

Value

Details

See also

Examples