likelihood#

Log-likelihood functions for conditional transformation models.

All censoring types (exact, left, right, interval) are supported. Optional linear regression shift via covariate matrix X. Optional per-observation weights and fixed offset are supported.

Mathematical convention#

Given a Bernstein basis B_k and coefficient vector theta_b (length p = order+1):

h(y)  = B_k(y) @ theta_b  [+ X @ beta  if covariates are present]
          [+ offset         if offset is provided]
h'(y) = B_k'(y) @ theta_b  (first derivative; beta and offset do not appear)

The target distribution Z follows one of:

"normal" — N(0, 1)
"logistic" — Logistic(0, 1)
"min_extreme_value" — standard minimum extreme value (reversed Gumbel),
the link that gives the Cox proportional hazards model, since log[-log S(t)] = h(t).
"max_extreme_value" — standard (right) Gumbel, the link that realises
the Lehmann / reverse-time proportional-hazards model: -log F(t) = h(t) + x'β.
"exponential" — standard exponential with rate 1. Support is
[0, ∞), enforced during optimisation by requiring h(y|x) >= 0. Without covariates this reduces to theta_b[0] >= 0; with covariates a per-row inequality theta_b[0] + X_i · β >= 0 is added for each training observation (see mltpy.constraints.build_constraints()).

Weighted log-likelihood#

With per-observation weights w_i ≥ 0 and offset o_i ∈ ℝ:

h_i = B_i · θ_b + x_i · β + o_i (offset enters as a constant) ℓ(θ) = Σ_i w_i ℓ_i(θ) ∇ℓ = Σ_i w_i s_i ∇²ℓ = Σ_i w_i H_i

The score matrix (estfun) convention follows R sandwich: row i = w_i · s_i, so column sums equal the full gradient. weights=None ≡ unit weights; offset=None ≡ zero offset.

Log-likelihood formulae#

NONE (exact):

ℓ = Σ_i [log f(h_i) + log h’_i]

RIGHT (exact + right-censored):

ℓ = Σ_{exact} [log f(h_i) + log h’_i]

Σ_{censored} log F̄(h_i)

LEFT (exact + left-censored):

ℓ = Σ_{exact} [log f(h_i) + log h’_i]

Σ_{censored} log F(h_i)

INTERVAL (interval-censored [l_i, u_i]):

ℓ = Σ_i log(F(h(u_i)) − F(h(l_i))) [+ exact terms if present]

Numerical stability#

All h values are clipped to [-30, 30] before distribution calls.
log CDF for normal is computed via scipy.special.log_ndtr, not log(cdf(…)).
log(1 − F) is computed via dist.logsf, not log(1 − cdf(…)).
log(F(b) − F(a)) uses a logsumexp trick with a Taylor fallback for very narrow intervals (see _log_diff_ndtr).

class mltpy.likelihood.DistOps(kind, scipy)[source]#

Bases: object

Typed wrapper around a scipy.stats distribution used by the likelihood.

Dispatch in the hot paths (analytical score, second derivative, log-CDF fast path) goes through kind — a plain string comparison against the BaseDistribution Literal — rather than dist is norm style identity checks. Identity-based dispatch is fragile: anything that reimports scipy, pickles the distribution, or wraps it for instrumentation can silently break the identity while leaving attribute access intact, which would previously fall through to the logistic branch in _neg_score() and _d2_logpdf() and produce wrong gradients.

The five methods logpdf(), logcdf(), logsf(), cdf(), and pdf() are defined as closed forms built on scipy.special ufuncs (log_ndtr, ndtr, expit, log_expit, expm1, log1p), dispatching on kind. This avoids the ~20× rv_continuous wrapper tax of the corresponding scipy.stats methods in the optimiser’s hot loop (issue #94). Inputs arrive clipped to ±_H_CLIP; results match the scipy objects to rtol≈1e-10 (the one intentional divergence is exponential logpdf(), which returns -h on the whole line rather than scipy’s -inf for h < 0 — see _ll_none()).

Every other method (ppf, sf, isf, …) is exposed unchanged through the __getattr__ forwarder to the underlying scipy distribution. Because methods defined on the class take priority over __getattr__, only those five names are intercepted; existing call sites that read e.g. dist.logpdf(h) get the fast path for free.

Parameters:

kind (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy'])
scipy (Any)

cdf(h)[source]#

Closed-form CDF, dispatched on kind.

Parameters:: h (ndarray[tuple[Any, ...], dtype[double]])
Return type:: ndarray[tuple[Any, ...], dtype[double]]

kind: Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy']#

logcdf(h)[source]#

Closed-form log CDF, dispatched on kind.

Parameters:: h (ndarray[tuple[Any, ...], dtype[double]])
Return type:: ndarray[tuple[Any, ...], dtype[double]]

logpdf(h)[source]#

Closed-form log density, dispatched on kind.

Parameters:: h (ndarray[tuple[Any, ...], dtype[double]])
Return type:: ndarray[tuple[Any, ...], dtype[double]]

logsf(h)[source]#

Closed-form log survival function, dispatched on kind.

Parameters:: h (ndarray[tuple[Any, ...], dtype[double]])
Return type:: ndarray[tuple[Any, ...], dtype[double]]

pdf(h)[source]#

Closed-form density, exp(logpdf(h)) for every kind.

Parameters:: h (ndarray[tuple[Any, ...], dtype[double]])
Return type:: ndarray[tuple[Any, ...], dtype[double]]

scipy: Any#

exception mltpy.likelihood.InfeasibleParameterError[source]#

Bases: ValueError

Raised when the log-likelihood is non-finite at the given theta.

Subclass of ValueError so callers that pre-date this class keep working. The optimiser catches this specific subclass to distinguish “this parameter point is infeasible, retreat” from “something is genuinely wrong” (e.g. a shape bug or an unsupported base_distribution, which should propagate).

mltpy.likelihood.hessian(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#

Analytical Hessian of the negative log-likelihood.

The returned matrix is the observed information ∂²(-ℓ)/∂θ∂θ' at theta. Inverting it yields the asymptotic covariance matrix of the maximum-likelihood estimator (see vcov()).

Parameters:

theta (ndarray[tuple[Any, ...], dtype[double]]) – Same as log_likelihood().
basis (BernsteinBasis | InteractionBasis) – Same as log_likelihood().
y (ndarray[tuple[Any, ...], dtype[double]] | CensoredData) – Same as log_likelihood().
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Same as log_likelihood().
censoring (CensoringType) – Same as log_likelihood().
base_distribution (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy']) – Same as log_likelihood().
weights (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation weights. See log_likelihood().
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation offset. See log_likelihood().
scaling (ndarray[tuple[Any, ...], dtype[double]] | None)

Returns:

Symmetric Hessian of -ℓ.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

InfeasibleParameterError – If any entry of the Hessian is non-finite.
ValueError – If base_distribution is not supported.

mltpy.likelihood.intercept_score(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#

Per-observation score w.r.t. an artificial intercept on h(y|x).

For each observation i, returns ∂ℓ_i/∂α evaluated at α = 0, where α is a hypothetical intercept added to the transformation: h̃(y|x) = h(y|x) + α. When weights are provided, row i is multiplied by w_i (same R sandwich convention as score_matrix()).

The closed forms by censoring type are:

exact y_i: ψ(h_i) = d log f(h_i)/dh
right-censored: -f(h_i)/S(h_i) (negative hazard)
left-censored: f(h_i)/F(h_i) (inverse Mills ratio)
interval-censored [a,b]: (f(h_b) - f(h_a)) / (F(h_b) - F(h_a))

Under scaling (scaling not None) the closed forms are unchanged — the artificial intercept α is added to the final h (post-scaling and post-shift), so ∂h̃/∂α = 1 for every row regardless of γ. γ enters only through the value at which the score is evaluated: h_i = h_0(y_i) · exp(0.5·x_s_i·γ) + x_d_i·β + offset_i.

Parameters:

theta (ndarray[tuple[Any, ...], dtype[double]]) – Same as score_matrix().
basis (BernsteinBasis) – Same as score_matrix().
y (ndarray[tuple[Any, ...], dtype[double]] | CensoredData) – Same as score_matrix().
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Same as score_matrix().
censoring (CensoringType) – Same as score_matrix().
base_distribution (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy']) – Same as score_matrix().
weights (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation weights. See log_likelihood().
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation offset. See log_likelihood().
scaling (ndarray[tuple[Any, ...], dtype[double]] | None) – Scaling-design matrix (n, q_s). When provided, theta is split as [θ_b | β | γ] and h is evaluated at the heteroskedastic value h_0(y) · exp(0.5·X_s·γ) + Xβ per ADR 0002.

Returns:

Vector of length n.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

InfeasibleParameterError – If any entry is non-finite (typically because theta violates monotonicity).
ValueError – If base_distribution is not supported.

mltpy.likelihood.log_likelihood(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#

Log-likelihood of a conditional transformation model.

Parameters:

theta (ndarray[tuple[Any, ...], dtype[double]]) – Parameter vector. First basis.order + 1 entries are Bernstein coefficients (must be non-decreasing for a valid model). Remaining entries are regression coefficients beta (only if X is given).
basis (BernsteinBasis | InteractionBasis) – BernsteinBasis instance defining the transformation.
y (ndarray[tuple[Any, ...], dtype[double]] | CensoredData) – Observations. If NDArray, treated as exact (ignores censoring). If CensoredData, the censoring type must be specified via censoring.
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Covariate matrix of shape (n, q), or None for no regression.
censoring (CensoringType) – Censoring regime. Only used when y is a CensoredData object.
base_distribution (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy']) – One of "normal" (default), "logistic", "min_extreme_value", "max_extreme_value", or "exponential". Selects the target distribution Z such that h(Y|X) ~ Z.
weights (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation weights of shape (n,). Non-negative, finite. None is equivalent to unit weights.
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation fixed linear predictor added to h(y|x) before distribution calls. Shape (n,), finite. Not optimised. None is equivalent to zero offset.
scaling (ndarray[tuple[Any, ...], dtype[double]] | None)

Return type:

float

Raises:

InfeasibleParameterError – If the result is -inf or NaN.
ValueError – From _get_dist() if base_distribution is not supported, or from _validate_weights_offset() if weights/offset are invalid.

mltpy.likelihood.negative_log_likelihood(theta, basis, y, X=None, censoring=CensoringType.NONE, gradient=False, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#

Negative log-likelihood (objective for minimisation) with optional gradient.

Parameters:

theta (ndarray[tuple[Any, ...], dtype[double]]) – Same as log_likelihood().
basis (BernsteinBasis | InteractionBasis) – Same as log_likelihood().
y (ndarray[tuple[Any, ...], dtype[double]] | CensoredData) – Same as log_likelihood().
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Same as log_likelihood().
censoring (CensoringType) – Same as log_likelihood().
gradient (bool) – If True, return a (nll, grad) tuple where grad is the analytical gradient of the negative log-likelihood w.r.t. theta. Computed analytically — no finite-difference approximation.
base_distribution (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy']) – One of "normal" (default), "logistic", "min_extreme_value", "max_extreme_value", or "exponential".
weights (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation weights. See log_likelihood().
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation offset. See log_likelihood().
scaling (ndarray[tuple[Any, ...], dtype[double]] | None)

Return type:

float | tuple[float, ndarray[tuple[Any, ...], dtype[double]]]

Returns:

float when gradient=False
(float, NDArray) when gradient=True

mltpy.likelihood.score_matrix(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#

Per-observation score contributions ∂ℓ_i/∂θ.

Returns the (n, p+q) matrix of per-observation gradients of the positive log-likelihood, often referred to as estfun in the R sandwich package. When weights are provided, row i is w_i · ∂ℓ_i/∂θ so that score_matrix(...).sum(axis=0) equals the full weighted log-likelihood gradient.

Parameters:

theta (ndarray[tuple[Any, ...], dtype[double]]) – Same as log_likelihood().
basis (BernsteinBasis | InteractionBasis) – Same as log_likelihood().
y (ndarray[tuple[Any, ...], dtype[double]] | CensoredData) – Same as log_likelihood().
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Same as log_likelihood().
censoring (CensoringType) – Same as log_likelihood().
base_distribution (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy']) – Same as log_likelihood().
weights (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation weights. See log_likelihood().
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Per-observation offset. See log_likelihood().
scaling (ndarray[tuple[Any, ...], dtype[double]] | None)

Returns:

Per-observation score matrix of shape (n, p+q).

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

InfeasibleParameterError – If any entry of the score matrix is non-finite.
ValueError – If base_distribution is not supported.