likelihood#
Log-likelihood functions for conditional transformation models.
All censoring types (exact, left, right, interval) are supported. Optional linear regression shift via covariate matrix X. Optional per-observation weights and fixed offset are supported.
Mathematical convention#
Given a Bernstein basis B_k and coefficient vector theta_b (length p = order+1):
h(y) = B_k(y) @ theta_b [+ X @ beta if covariates are present]
[+ offset if offset is provided]
h'(y) = B_k'(y) @ theta_b (first derivative; beta and offset do not appear)
The target distribution Z follows one of:
"normal"— N(0, 1)"logistic"— Logistic(0, 1)"min_extreme_value"— standard minimum extreme value (reversed Gumbel),the link that gives the Cox proportional hazards model, since
log[-log S(t)] = h(t).
"max_extreme_value"— standard (right) Gumbel, the link that realisesthe Lehmann / reverse-time proportional-hazards model:
-log F(t) = h(t) + x'β.
"exponential"— standard exponential with rate 1. Support is[0, ∞), enforced during optimisation by requiringh(y|x) >= 0. Without covariates this reduces totheta_b[0] >= 0; with covariates a per-row inequalitytheta_b[0] + X_i · β >= 0is added for each training observation (seemltpy.constraints.build_constraints()).
Weighted log-likelihood#
With per-observation weights w_i ≥ 0 and offset o_i ∈ ℝ:
h_i = B_i · θ_b + x_i · β + o_i (offset enters as a constant) ℓ(θ) = Σ_i w_i ℓ_i(θ) ∇ℓ = Σ_i w_i s_i ∇²ℓ = Σ_i w_i H_i
The score matrix (estfun) convention follows R sandwich: row i = w_i · s_i,
so column sums equal the full gradient. weights=None ≡ unit weights;
offset=None ≡ zero offset.
Log-likelihood formulae#
- NONE (exact):
ℓ = Σ_i [log f(h_i) + log h’_i]
- RIGHT (exact + right-censored):
- ℓ = Σ_{exact} [log f(h_i) + log h’_i]
Σ_{censored} log F̄(h_i)
- LEFT (exact + left-censored):
- ℓ = Σ_{exact} [log f(h_i) + log h’_i]
Σ_{censored} log F(h_i)
- INTERVAL (interval-censored [l_i, u_i]):
ℓ = Σ_i log(F(h(u_i)) − F(h(l_i))) [+ exact terms if present]
Numerical stability#
All h values are clipped to [-30, 30] before distribution calls.
log CDF for normal is computed via scipy.special.log_ndtr, not log(cdf(…)).
log(1 − F) is computed via dist.logsf, not log(1 − cdf(…)).
log(F(b) − F(a)) uses a logsumexp trick with a Taylor fallback for very narrow intervals (see _log_diff_ndtr).
- class mltpy.likelihood.DistOps(kind, scipy)[source]#
Bases:
objectTyped wrapper around a scipy.stats distribution used by the likelihood.
Dispatch in the hot paths (analytical score, second derivative, log-CDF fast path) goes through
kind— a plain string comparison against theBaseDistributionLiteral — rather thandist is normstyle identity checks. Identity-based dispatch is fragile: anything that reimports scipy, pickles the distribution, or wraps it for instrumentation can silently break the identity while leaving attribute access intact, which would previously fall through to the logistic branch in_neg_score()and_d2_logpdf()and produce wrong gradients.The five methods
logpdf(),logcdf(),logsf(),cdf(), andpdf()are defined as closed forms built onscipy.specialufuncs (log_ndtr,ndtr,expit,log_expit,expm1,log1p), dispatching onkind. This avoids the ~20×rv_continuouswrapper tax of the correspondingscipy.statsmethods in the optimiser’s hot loop (issue #94). Inputs arrive clipped to±_H_CLIP; results match the scipy objects tortol≈1e-10(the one intentional divergence is exponentiallogpdf(), which returns-hon the whole line rather than scipy’s-infforh < 0— see_ll_none()).Every other method (
ppf,sf,isf, …) is exposed unchanged through the__getattr__forwarder to the underlying scipy distribution. Because methods defined on the class take priority over__getattr__, only those five names are intercepted; existing call sites that read e.g.dist.logpdf(h)get the fast path for free.- Parameters:
- kind: Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy']#
- exception mltpy.likelihood.InfeasibleParameterError[source]#
Bases:
ValueErrorRaised when the log-likelihood is non-finite at the given
theta.Subclass of
ValueErrorso callers that pre-date this class keep working. The optimiser catches this specific subclass to distinguish “this parameter point is infeasible, retreat” from “something is genuinely wrong” (e.g. a shape bug or an unsupportedbase_distribution, which should propagate).
- mltpy.likelihood.hessian(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#
Analytical Hessian of the negative log-likelihood.
The returned matrix is the observed information
∂²(-ℓ)/∂θ∂θ'attheta. Inverting it yields the asymptotic covariance matrix of the maximum-likelihood estimator (seevcov()).- Parameters:
theta (
ndarray[tuple[Any,...],dtype[double]]) – Same aslog_likelihood().basis (
BernsteinBasis|InteractionBasis) – Same aslog_likelihood().y (
ndarray[tuple[Any,...],dtype[double]] |CensoredData) – Same aslog_likelihood().X (
ndarray[tuple[Any,...],dtype[double]] |None) – Same aslog_likelihood().censoring (
CensoringType) – Same aslog_likelihood().base_distribution (
Literal['normal','logistic','min_extreme_value','max_extreme_value','exponential','laplace','cauchy']) – Same aslog_likelihood().weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation weights. Seelog_likelihood().offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation offset. Seelog_likelihood().
- Returns:
Symmetric Hessian of
-ℓ.- Return type:
- Raises:
InfeasibleParameterError – If any entry of the Hessian is non-finite.
ValueError – If
base_distributionis not supported.
- mltpy.likelihood.intercept_score(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#
Per-observation score w.r.t. an artificial intercept on
h(y|x).For each observation
i, returns∂ℓ_i/∂αevaluated atα = 0, whereαis a hypothetical intercept added to the transformation:h̃(y|x) = h(y|x) + α. Whenweightsare provided, rowiis multiplied byw_i(same Rsandwichconvention asscore_matrix()).The closed forms by censoring type are:
exact
y_i:ψ(h_i) = d log f(h_i)/dhright-censored:
-f(h_i)/S(h_i)(negative hazard)left-censored:
f(h_i)/F(h_i)(inverse Mills ratio)interval-censored [a,b]:
(f(h_b) - f(h_a)) / (F(h_b) - F(h_a))
Under scaling (
scalingnotNone) the closed forms are unchanged — the artificial interceptαis added to the final h (post-scaling and post-shift), so∂h̃/∂α = 1for every row regardless of γ. γ enters only through the value at which the score is evaluated:h_i = h_0(y_i) · exp(0.5·x_s_i·γ) + x_d_i·β + offset_i.- Parameters:
theta (
ndarray[tuple[Any,...],dtype[double]]) – Same asscore_matrix().basis (
BernsteinBasis) – Same asscore_matrix().y (
ndarray[tuple[Any,...],dtype[double]] |CensoredData) – Same asscore_matrix().X (
ndarray[tuple[Any,...],dtype[double]] |None) – Same asscore_matrix().censoring (
CensoringType) – Same asscore_matrix().base_distribution (
Literal['normal','logistic','min_extreme_value','max_extreme_value','exponential','laplace','cauchy']) – Same asscore_matrix().weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation weights. Seelog_likelihood().offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation offset. Seelog_likelihood().scaling (
ndarray[tuple[Any,...],dtype[double]] |None) – Scaling-design matrix(n, q_s). When provided,thetais split as[θ_b | β | γ]and h is evaluated at the heteroskedastic valueh_0(y) · exp(0.5·X_s·γ) + Xβper ADR 0002.
- Returns:
Vector of length
n.- Return type:
- Raises:
InfeasibleParameterError – If any entry is non-finite (typically because
thetaviolates monotonicity).ValueError – If
base_distributionis not supported.
- mltpy.likelihood.log_likelihood(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#
Log-likelihood of a conditional transformation model.
- Parameters:
theta (
ndarray[tuple[Any,...],dtype[double]]) – Parameter vector. Firstbasis.order + 1entries are Bernstein coefficients (must be non-decreasing for a valid model). Remaining entries are regression coefficients beta (only if X is given).basis (
BernsteinBasis|InteractionBasis) –BernsteinBasisinstance defining the transformation.y (
ndarray[tuple[Any,...],dtype[double]] |CensoredData) – Observations. IfNDArray, treated as exact (ignorescensoring). IfCensoredData, the censoring type must be specified viacensoring.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Covariate matrix of shape (n, q), or None for no regression.censoring (
CensoringType) – Censoring regime. Only used whenyis aCensoredDataobject.base_distribution (
Literal['normal','logistic','min_extreme_value','max_extreme_value','exponential','laplace','cauchy']) – One of"normal"(default),"logistic","min_extreme_value","max_extreme_value", or"exponential". Selects the target distribution Z such that h(Y|X) ~ Z.weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation weights of shape(n,). Non-negative, finite.Noneis equivalent to unit weights.offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation fixed linear predictor added toh(y|x)before distribution calls. Shape(n,), finite. Not optimised.Noneis equivalent to zero offset.
- Return type:
- Raises:
InfeasibleParameterError – If the result is
-inforNaN.ValueError – From
_get_dist()ifbase_distributionis not supported, or from_validate_weights_offset()ifweights/offsetare invalid.
- mltpy.likelihood.negative_log_likelihood(theta, basis, y, X=None, censoring=CensoringType.NONE, gradient=False, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#
Negative log-likelihood (objective for minimisation) with optional gradient.
- Parameters:
theta (
ndarray[tuple[Any,...],dtype[double]]) – Same aslog_likelihood().basis (
BernsteinBasis|InteractionBasis) – Same aslog_likelihood().y (
ndarray[tuple[Any,...],dtype[double]] |CensoredData) – Same aslog_likelihood().X (
ndarray[tuple[Any,...],dtype[double]] |None) – Same aslog_likelihood().censoring (
CensoringType) – Same aslog_likelihood().gradient (
bool) – IfTrue, return a(nll, grad)tuple wheregradis the analytical gradient of the negative log-likelihood w.r.t.theta. Computed analytically — no finite-difference approximation.base_distribution (
Literal['normal','logistic','min_extreme_value','max_extreme_value','exponential','laplace','cauchy']) – One of"normal"(default),"logistic","min_extreme_value","max_extreme_value", or"exponential".weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation weights. Seelog_likelihood().offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation offset. Seelog_likelihood().
- Return type:
float|tuple[float,ndarray[tuple[Any,...],dtype[double]]]- Returns:
float when
gradient=False(float, NDArray) when
gradient=True
- mltpy.likelihood.score_matrix(theta, basis, y, X=None, censoring=CensoringType.NONE, base_distribution='normal', weights=None, offset=None, scaling=None)[source]#
Per-observation score contributions
∂ℓ_i/∂θ.Returns the
(n, p+q)matrix of per-observation gradients of the positive log-likelihood, often referred to asestfunin the Rsandwichpackage. Whenweightsare provided, rowiisw_i · ∂ℓ_i/∂θso thatscore_matrix(...).sum(axis=0)equals the full weighted log-likelihood gradient.- Parameters:
theta (
ndarray[tuple[Any,...],dtype[double]]) – Same aslog_likelihood().basis (
BernsteinBasis|InteractionBasis) – Same aslog_likelihood().y (
ndarray[tuple[Any,...],dtype[double]] |CensoredData) – Same aslog_likelihood().X (
ndarray[tuple[Any,...],dtype[double]] |None) – Same aslog_likelihood().censoring (
CensoringType) – Same aslog_likelihood().base_distribution (
Literal['normal','logistic','min_extreme_value','max_extreme_value','exponential','laplace','cauchy']) – Same aslog_likelihood().weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation weights. Seelog_likelihood().offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Per-observation offset. Seelog_likelihood().
- Returns:
Per-observation score matrix of shape
(n, p+q).- Return type:
- Raises:
InfeasibleParameterError – If any entry of the score matrix is non-finite.
ValueError – If
base_distributionis not supported.