Computes the joint inclusion probability matrix using the high-entropy approximation of Brewer & Donadio (2003, eq. 18): $$\pi_{ij} \approx \pi_i \pi_j \frac{c_i + c_j}{2}$$ where \(c_k = \frac{n-1}{n - \frac{2n-1}{n-1}\pi_k + \frac{\sum_\ell \pi_\ell^2}{n-1}}\) and \(n = \sum_k \pi_k\).
Arguments
- pik
Numeric vector of inclusion probabilities (\(0 \le \pi_k \le 1\)).
- sample_idx
Integer vector of 1-based indices for the sampled units, or
NULL(default) for the full population matrix. When non-NULL, returns the submatrix for those units only, without allocating the full N x N matrix.- eps
Boundary tolerance (default 1e-6). Units with \(\pi_k \ge 1 - \varepsilon\) are treated as certainty selections; units with \(\pi_k \le \varepsilon\) are treated as zero.
- ...
Additional arguments (ignored). Present so that the function matches the
joint_fnsignature required byregister_method().
Value
A symmetric matrix of joint inclusion probabilities:
N x N when sample_idx is NULL, or
length(sample_idx) x length(sample_idx) otherwise.
Diagonal entries are \(\pi_i\).
Details
The high-entropy (HE) approximation is the recommended default for designs that are close to maximum entropy, which includes most common unequal-probability without-replacement designs: Brewer, Sampford, Tille, SPS, Pareto, cube, and CPS itself.
The approximation guarantees symmetry, \(0 \le \pi_{ij} \le \min(\pi_i, \pi_j)\), and correct diagonal (\(\pi_{ii} = \pi_i\)), but does not exactly satisfy the marginal identity \(\sum_{j \neq i} \pi_{ij} = (n-1)\pi_i\). The defect is typically small but grows with skewed \(\pi_k\) and small \(n\).
Internally, this calls the same C implementation used by
joint_inclusion_prob() for Brewer, SPS, Pareto, and cube
methods. Exported so that custom methods registered via
register_method() can use it directly as their joint_fn:
For a lighter alternative based on conditional Poisson theory,
see hajek_jip().
References
Brewer, K.R.W. and Donadio, M.E. (2003). The high entropy variance of the Horvitz-Thompson estimator. Survey Methodology, 29(2), 189–196.
See also
hajek_jip() for the Hajek approximation,
joint_inclusion_prob() for design-based dispatch,
register_method() for custom method registration.
Examples
pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)
# Full N x N matrix
pikl <- he_jip(pik)
round(pikl, 4)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.1818 0.0350 0.0480 0.0618 0.0765 0.0925 0.1098 0.1289
#> [2,] 0.0350 0.2727 0.0737 0.0948 0.1174 0.1417 0.1682 0.1972
#> [3,] 0.0480 0.0737 0.3636 0.1296 0.1604 0.1935 0.2294 0.2688
#> [4,] 0.0618 0.0948 0.1296 0.4545 0.2059 0.2482 0.2939 0.3440
#> [5,] 0.0765 0.1174 0.1604 0.2059 0.5455 0.3062 0.3624 0.4237
#> [6,] 0.0925 0.1417 0.1935 0.2482 0.3062 0.6364 0.4355 0.5087
#> [7,] 0.1098 0.1682 0.2294 0.2939 0.3624 0.4355 0.7273 0.5999
#> [8,] 0.1289 0.1972 0.2688 0.3440 0.4237 0.5087 0.5999 0.8182
# Submatrix for specific units
he_jip(pik, sample_idx = c(1, 3, 5))
#> [,1] [,2] [,3]
#> [1,] 0.18181818 0.04796609 0.07652019
#> [2,] 0.04796609 0.36363636 0.16040262
#> [3,] 0.07652019 0.16040262 0.54545455
# Use as joint_fn in register_method()
register_method("my_method", sample_fn = function(pik, n, prn, ...) {
sample.int(length(pik), n, prob = pik)
}, joint_fn = he_jip)
unregister_method("my_method")