Compute joint inclusion probabilities from a sample and its frame

Reconstructs the joint inclusion probabilities (π_kl) for PPS without replacement stages by replaying the design specification against the original sampling frame. This allows exact variance estimation via survey::ppsmat() instead of the default Brewer approximation used by as_survey_design().

joint_inclusion_prob(x, frame, stage = NULL)

Arguments

x: A tbl_sample object produced by execute().
frame: The data frame originally passed to execute(). Must contain the same columns used during sampling (strata variables, cluster variables, measure of size).
stage: An integer vector of stage numbers to compute, or NULL (default) to compute all PPS without replacement stages. Non-PPS stages produce NULL entries in the returned list.

Value

A named list of length equal to the number of executed stages. Each element is either:

A square matrix of joint inclusion probabilities for PPS WOR stages, with dimensions matching the number of sampled units at that stage (or sampled clusters for clustered stages). Rows and columns are ordered to match the sample.
NULL for non-PPS stages (SRS, systematic, WR methods, PMR methods) or stages not requested via the stage argument.

Details

For each PPS without replacement stage, the function:

Reconstructs the full-population inclusion probabilities (π_i) from the frame using the stage's method and measure of size
Dispatches to the appropriate sondage joint probability function
Extracts the submatrix corresponding to sampled units

For stratified stages, the target sample size per stratum (n_h) is reconstructed by replaying the same allocation logic used during execute() (proportional, Neyman, optimal, etc.) against the frame. This ensures π_i values match what was computed at sampling time, regardless of allocation method.

For stratified or conditional (within-cluster) stages, joint probabilities are computed independently within each group and assembled into a block-diagonal matrix.

Method dispatch

samplyr method	sondage function
`pps_brewer`	`up_brewer_jip(pik)`
`pps_systematic`	`up_systematic_jip(pik)`
`pps_maxent`	`up_maxent_jip(pik)`
`pps_poisson`	`up_poisson_jip(pik)`

Chromy's sequential PPS method (PMR)

pps_chromy is excluded from joint probability computation. Chromy's method is a Probability Minimum Replacement (PMR) method, not a standard WOR method. Units can receive multiple hits, so the relevant pairwise quantities are expected sample size products E(n_i · n_j), not inclusion probabilities π_ij.

While sondage provides up_chromy_pairexp() to estimate E(n_i · n_j) via Monte Carlo, the result cannot be passed to survey::ppsmat() because the survey package assumes the diagonal contains π_i, but for PMR it contains E(n_i²) ≠ E(n_i) when units receive multiple hits. Chromy (2009) recommends the Hansen-Hurwitz approximation for variance estimation, which is the default used by as_survey_design().

Limitations

Requires the original sampling frame. The frame must be unchanged from what was passed to execute().
Units in the frame must be uniquely identifiable within each stratum/cluster group by their column values.
For designs with certainty selections (π_i = 1), the joint matrix is decomposed: certainty units are separated from the stochastic part, the joint probabilities for non-certainty units are computed from the reduced π vector, and the full matrix is reassembled with π_ij = 1 for certainty pairs and π_ij = π_j for certainty × non-certainty pairs.

Examples

if (FALSE) { # \dontrun{
sample <- sampling_design() |>
  stage() |>
    stratify_by(region) |>
    cluster_by(ea_id) |>
    draw(n = 5, method = "pps_brewer", mos = hh_count) |>
  stage() |>
    draw(n = 12) |>
  execute(niger_eas, seed = 2025)

# Compute joint probabilities for stage 1
jip <- joint_inclusion_prob(sample, niger_eas, stage = 1)

# Use with survey package for exact variance
svy <- as_survey_design(sample, pps = survey::ppsmat(jip[[1]]))

# Compute all PPS stages at once
jip_all <- joint_inclusion_prob(sample, niger_eas)
} # }