model#

Public API for conditional transformation models.

Users import exclusively from this module:

import mltpy
model = mltpy.MLT(order=6, support=(0, 100))
model.fit(y)
cdf = model.predict(y_new, what="distribution")

Classes#

ConditionalTransformationModel: Base class for all transformation models.
MLT: Most Likely Transformation — convenience subclass with sensible defaults.

class mltpy.model.AnovaResult(model_names, n_params, log_lik, df, deviance, p_value)[source]#

Bases: object

Result of a likelihood-ratio test comparing nested models.

Models are sorted by n_params ascending (reduced → full). For each model after the first, the entry at the same index gives the LR statistic comparing it to the previous model in the sequence.

Parameters:

model_names (tuple[str, ...]) – Display names of the compared models, in the same order as the rows.
n_params (tuple[int, ...]) – Number of free parameters per model.
log_lik (tuple[float, ...]) – Maximised log-likelihood per model.
df (tuple[int | None, ...]) – Degrees of freedom for each pairwise test (None for the first row, which has no predecessor).
deviance (tuple[float | None, ...]) – Likelihood-ratio statistic D = 2·(loglik_full − loglik_reduced) for each pairwise test (None for the first row).
p_value (tuple[float | None, ...]) – Right-tail probability of the chi-squared distribution with the corresponding degrees of freedom (None for the first row).

deviance: tuple[float | None, ...]#

df: tuple[int | None, ...]#

log_lik: tuple[float, ...]#

model_names: tuple[str, ...]#

n_params: tuple[int, ...]#

p_value: tuple[float | None, ...]#

class mltpy.model.ConditionalTransformationModel(basis, censoring=CensoringType.NONE, optimizer_config=None, base_distribution='normal', scaling=None)[source]#

Bases: object

Base class for conditional transformation models.

Fits a monotone transformation h(y|x) parametrised as a Bernstein polynomial such that h(y|x) follows a standard distribution.

Parameters:

basis (BernsteinBasis | InteractionBasis) – BernsteinBasis defining the response transformation.
censoring (CensoringType | None) – Censoring type of the response data. Defaults to NONE.
optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, defaults from OptimizerConfig are used.
base_distribution (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy'])
scaling (ndarray[tuple[Any, ...], dtype[double]] | None)

property Theta_: ndarray[tuple[Any, ...], dtype[float64]] | None#

Coefficient matrix Θ of shape (p, q) for interaction models.

None before fit() or for non-interaction models. theta_[i*q + j] = Θ[i, j] (row-major layout).

aic()[source]#

Akaike Information Criterion of the fitted model.

Returns:: AIC = -2 · loglik + 2 · k where k is the number of free parameters (n_free_params_) and loglik is the maximised log-likelihood.
Return type:: float
Raises:: NotFittedError – If called before fit().

Notes

Lower is better. The monotonicity inequality D @ theta_b >= 0 is not counted as a binding equality constraint, so k equals the full length of theta_ — matching R mlt::AIC.mlt, which uses length(coef(fit)).

Examples

>>> model = MLT(order=4, support=(0, 1)).fit(y)
>>> model.aic()

bic()[source]#

Bayesian Information Criterion of the fitted model.

Returns:: BIC = -2 · loglik + log(n) · k where n is the number of observations (n_obs_), k the number of free parameters (n_free_params_), and loglik the maximised log-likelihood.
Return type:: float
Raises:: NotFittedError – If called before fit().

Notes

Lower is better. Penalises additional parameters more heavily than aic() for n > 7. Matches R mlt::BIC.mlt which uses length(coef(fit)) for k.

Examples

>>> model = MLT(order=4, support=(0, 1)).fit(y)
>>> model.bic()

confband(y_grid, X=None, level=0.95, what='distribution', offset=None)[source]#

Pointwise delta-method confidence band for a predicted curve.

For each grid point y_i (with an optional covariate profile x), compute a “linear-predictor” scale η_i together with its asymptotic variance via the delta method

\[\eta_i = g(y_i, x;\,\theta),\qquad \mathrm{Var}(\eta_i) = J_i\,V\,J_i^\top,\quad J_i = \partial\eta_i/\partial\theta,\]

form the Wald interval η_i ± z · sqrt(Var(η_i)), and back-transform the endpoints to the requested what scale. The intervals are pointwise, not simultaneous.

The linear predictor and back-transform depend on what:

"trafo" — η = h; back-transform = identity
"distribution" — η = h; back-transform = F_base(·)
"survivor" — η = h; back-transform = 1 − F_base(·) (endpoints swapped, since 1 − F is decreasing)
"density" — η = log f(h) + log h'; back-transform = exp(·)
"hazard" — η = log f(h) + log h' − log S(h); back-transform = exp(·)

Parameters:

y_grid (ndarray[tuple[Any, ...], dtype[double]]) – Response values at which to evaluate the band. Must lie within basis.support.
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Covariate profile for a single curve. Accepts a 1D array of length q or a 2D (1, q) array; broadcast across y_grid. Required when the model was fit with covariates; must be None when it was not.
level (float) – Confidence level in (0, 1). Defaults to 0.95.
what (Literal['trafo', 'distribution', 'survivor', 'density', 'hazard']) – One of "trafo", "distribution", "survivor", "density", "hazard". Defaults to "distribution".
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional per-grid-point offset of shape (len(y_grid),). Added to h before computing the band; does not affect the delta-method Jacobian (offset is constant w.r.t. theta).

Returns:

Array of shape (len(y_grid), 3) with columns [estimate, lower, upper] on the what scale.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
ValueError – If level is outside (0, 1), what is not supported, or the shape/presence of X is inconsistent with the fitted model.
RuntimeError – Propagated from vcov() on singular Hessians, or if the fitted basis violates monotonicity at a grid point (h'(y) ≤ 0), which would make the density/hazard linear predictor ill-defined.

Notes

Working on the transformation scale before back-transforming keeps probability bands in [0, 1] and density/hazard bands positive. The reference R routine mlt::confband builds simultaneous bands via multivariate-normal quantiles; this implementation is pointwise to match the Wald construction used in most applied survival plots.

Examples

>>> model = Coxph(support=(0.01, t.max())).fit(cd, X=X)
>>> grid = np.linspace(0.1, t.max(), 100)
>>> band = model.confband(grid, X=X[:1], what="survivor")
>>> ax.fill_between(grid, band[:, 1], band[:, 2], alpha=0.2)
>>> ax.plot(grid, band[:, 0])

confint(level=0.95, parm=None, type='wald')[source]#

Confidence intervals for theta_.

Two interval types are supported:

type="wald" (default) — symmetric normal-approximation interval

\[\hat\theta_j \pm z_{1-\alpha/2}\,\sqrt{V_{jj}},\]

where \(V = \mathrm{vcov}()\) is the inverse observed information matrix and \(z_{1-\alpha/2}\) is the standard normal quantile for confidence level \(= 1-\alpha\). Matches R confint.default(mlt_fit, level=level).
type="profile" — profile-likelihood interval obtained by inverting the \(\chi^2_1\) likelihood-ratio test. For each requested parameter index \(j\) we solve

\[2\,(\hat\ell - \ell_p(v)) = \chi^2_{1,1-\alpha},\]

where \(\ell_p(v)\) is the maximised log-likelihood with \(\theta_j\) pinned to \(v\) and the remaining parameters re-optimised under the model constraints. Each parameter costs roughly ten constrained refits, so always pass parm= to restrict the work on larger models.

Robustness (issue #89): three inner-fit failure modes can occur per parameter — (i) the adaptive bracket fails to span a sign change, (ii) the pinned refit lands on a degenerate monotonicity active set so the equality theta[j] = v cannot be honoured (“boundary”), or (iii) the pinned refit does not converge to tolerance (“convergence”, KKT residual ≥ _PROFILE_INNER_KKT_THRESHOLD). When parm is None (you asked for every parameter) each failure emits a ConvergenceWarning naming the parameter and writes ±np.inf (bracket / boundary) or np.nan (convergence) to that row, so one un-identified parameter does not abort the whole call. When parm is an explicit sequence (you asked for those parameters specifically) the same failures re-raise as RuntimeError so you can debug the request.

Parameters:

level (float) – Confidence level in (0, 1). Defaults to 0.95.
parm (Sequence[int] | None) – Optional sequence of integer indices selecting a subset of parameters. None returns intervals for all entries of theta_.
type (Literal['wald', 'profile']) – Interval type. "wald" (default) preserves the existing normal-approximation behaviour; "profile" returns the likelihood-ratio interval.

Returns:

Array of shape (k, 2) with columns [lower, upper]; k equals len(theta_) when parm is None else len(parm). Row order matches the requested index order.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
ValueError – If level is outside (0, 1), parm contains indices outside [0, len(theta_)), or type is not one of {"wald", "profile"}.
RuntimeError – Propagated from vcov() on singular Hessians (Wald), or from the profile-CI bracket search / inner-fit failure when an explicit parm was provided (profile). Under parm=None the same failures become ConvergenceWarning instead.

Examples

>>> model = MLT(order=4, support=(0, 1)).fit(y)
>>> ci = model.confint(level=0.95)  # shape (p, 2)
>>> ci_prof = model.confint(level=0.95, parm=[0], type="profile")

estfun()[source]#

Per-observation score contributions, (n, p+q).

Equivalent to R’s sandwich::estfun(mlt_fit): row i is ∂ℓ_i/∂θ evaluated at theta_. At the MLE the column sums are zero up to optimiser tolerance.

Returns:

Matrix of shape (n_obs_, p+q). Computed eagerly in fit() and cached; subsequent mutations of the original y/X cannot affect the returned matrix.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
RuntimeError – If the cached score matrix is unexpectedly missing after fitting (e.g. a prior fit() call failed partway through).

feature_names_in_: list[str] | None#: Names of the covariate columns supplied to fit(), if any. Populated from a pandas.DataFrame column index when available, otherwise ["X1", "X2", ...]. None when the model was fit without covariates.

fit(y, X=None, weights=None, offset=None)[source]#

Fit the transformation model by maximum likelihood.

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]] | CensoredData) – Response observations. Must lie within basis.support. Accepts np.ndarray, pd.Series, or CensoredData.
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional covariate matrix of shape (n, q). If given, the last q entries of theta_ are regression coefficients.
weights (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional non-negative per-observation weights of shape (n,). The weighted log-likelihood Σ w_i · ℓ_i is maximised; no normalisation is applied. None is equivalent to all-ones.
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional per-observation offset of shape (n,). Added to h(y|x) before distribution calls on every likelihood evaluation: h_eff = B·θ_b + X·β + offset. None is equivalent to all-zeros.

Returns:

Returns itself for method chaining:

cdf = model.fit(y).predict(y, what="distribution")

Return type:

ConditionalTransformationModel

Raises:

ValueError – If y contains values outside basis.support, or if weights/offset have the wrong shape or invalid values.

property gamma_coef_: ndarray[tuple[Any, ...], dtype[float64]] | None#

Scaling-block coefficients γ (length q_s).

None before fit() or when the model was constructed without scaling=. Sign-aligned with R tram::*(scale=...)’s scaling block (no flip needed for parity comparisons; see docs/adr/0002-scaling-terms.md, Decision 5).

hessian_: ndarray[tuple[Any, ...], dtype[float64]] | None#: Observed information matrix — analytical Hessian of the negative log-likelihood evaluated at theta_. Shape (p+q, p+q). Computed eagerly at the end of fit(). None before fit().

is_fitted_: bool#: Whether fit() has been called successfully.

n_free_params_: int | None#: Number of free parameters in the fitted model — equal to len(theta_) (Bernstein coefficients plus optional regression coefficients). The monotonicity constraint D @ theta_b >= 0 is an inequality and does not reduce the parameter count. None before fit().

n_obs_: int | None#: Number of observations used in fit(). For CensoredData, this is y.n; otherwise len(y). None before fit().

offset_: ndarray[tuple[Any, ...], dtype[float64]] | None#: Per-observation offset supplied to the last fit() call. None when no offset was used.

plot(y, X=None, ax=None)[source]#

Plot the estimated CDF and density.

For non-interacting models, draws a single CDF/density curve over y (covariates are ignored — the unconditional baseline is shown). For InteractionBasis models, draws one CDF curve and one density curve per row of X on a shared y-axis.

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]]) – Response values at which to evaluate the model. Must lie within basis.support.
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Required for InteractionBasis models: a 2-D matrix whose rows are the representative covariate values at which to draw the conditional curves. Ignored for non-interacting models.
ax (object) – Optional 2-tuple (ax_cdf, ax_pdf) of matplotlib.axes.Axes, or a single matplotlib.axes.Axes instance. If a single axes is given, only the CDF is plotted. If None, a new figure with two subplots is created automatically.

Returns:

[ax_cdf, ax_pdf] if two panels are plotted, otherwise the single ax_cdf.

Return type:

object

Raises:

NotFittedError – If called before fit().
ImportError – If matplotlib is not installed.
ValueError – If X is not provided for an interacting model, or if it cannot be interpreted as a 2-D array.
TypeError – If ax is provided but cannot be unpacked into two axes nor used as a single axes.

predict(y_new, X_new=None, what='distribution', offset_new=None, X_scale_new=None)[source]#

Compute model predictions at new observations.

Parameters:

y_new (ndarray[tuple[Any, ...], dtype[double]]) – For what="quantile": probabilities in (0, 1). For all other what: response values in basis.support.
X_new (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional covariate matrix of shape (m, q).
offset_new (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional per-observation offset of shape (m,). Added to h(y|x) before distribution calls.
X_scale_new (ndarray[tuple[Any, ...], dtype[double]] | None) – New-data scaling-design matrix of shape (m, q_s), required when the model was fitted with scaling=. Enters via h(y|x_d, x_s) = h_0(y) · exp(0.5 · x_s · γ) + x_d · β — same parameterisation as fit(). Pass None for non-scaling fits.
what (Literal['trafo', 'distribution', 'logdistribution', 'survivor', 'logsurvivor', 'density', 'logdensity', 'hazard', 'loghazard', 'cumhazard', 'logcumhazard', 'odds', 'logodds', 'quantile']) –
Type of prediction. Let h = h(y|x) and h' = ∂h/∂y; F, S, f denote the base distribution’s CDF, survivor, and PDF.
- "trafo" — Transformation h(y|x)
- "distribution" — CDF: F(h)
- "logdistribution" — log F(h)
- "survivor" — Survivor: S(h) = 1 − F(h)
- "logsurvivor" — log S(h)
- "density" — PDF: f(h) · h'
- "logdensity" — log f(h) + log h'
- "hazard" — Hazard: f(h) · h' / S(h)
- "loghazard" — log f(h) + log h' − log S(h)
- "cumhazard" — Cumulative hazard: −log S(h)
- "logcumhazard" — log(−log S(h))
- "odds" — F(h) / S(h)
- "logodds" — log F(h) − log S(h)
- "quantile" — Quantile via inversion; right-censored models use an R-compatible grid+spline inversion.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
ValueError – If what is not one of the valid options.

Notes

Log-scale variants use scipy.special.log_ndtr for the normal distribution’s log-CDF (more accurate in the tails) and dist.logcdf/logsf/logpdf otherwise.

Examples

>>> model = MLT(order=4, support=(0, 1)).fit(y)
>>> cdf = model.predict(y_new, what="distribution")
>>> q50 = model.predict(np.array([0.5]), what="quantile")

residuals(type='score')[source]#

Per-observation residuals for model diagnostics.

Computed at the training data passed to fit(). Mirrors R mlt::residuals for type="score"; the Cox-Snell and deviance forms are derived from the fitted survivor function.

Parameters:

type (Literal['score', 'cox-snell', 'deviance']) –

Which residual to compute.

"score" (default) — score residual w.r.t. an artificial intercept added to h(y|x): for exact -ψ(h_i); for right-censored f(h)/S(h); for left-censored -f(h)/F(h); for interval -(f(h_b) - f(h_a)) / (F(h_b) - F(h_a)). Sign matches R mlt::residuals (the negative of the positive-log-likelihood score). At the MLE the sum is zero up to optimiser tolerance.
"cox-snell" — r_i = -log S(y_i|x_i). Under a correctly specified model these are approximately Exp(1). For censored observations y_i is the censoring threshold (lower for right-censored, upper for left-censored, the midpoint for interval-censored); the resulting residuals for those observations are themselves censored Exp(1) variates.
"deviance" — sign(r_i - 1) · sqrt(2·|r_i - log(r_i) - 1|) where r_i is the Cox-Snell residual. Under a correctly specified model these are approximately standard normal.

Returns:

Vector of length n_obs_.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
ValueError – If type is not one of the supported residual kinds.

Notes

type="score" matches R mlt::residuals(mlt_fit) exactly, element-wise to rtol=1e-6.
type="cox-snell" uses the same evaluation point convention as R’s -log(predict(mlt_fit, type = "survivor")).
r_i is clipped at np.finfo(float).tiny before log(r_i) in the deviance formula to avoid -inf.

result_: OptimizationResult | None#: Full result object from the last fit() call. None before fit().

sandwich_se(regularize='active')[source]#

Sandwich (robust) standard errors for theta_.

Computed as sqrt(diag(sandwich_vcov(regularize=regularize))).

Parameters:

regularize (str | None) – Passed directly to sandwich_vcov(). See vcov() for details.

Returns:

Vector of length len(theta_).

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
RuntimeError – Propagated from sandwich_vcov() on singular Hessians, or if the sandwich variance matrix has negative diagonal entries.

sandwich_vcov(regularize='active')[source]#

Sandwich (robust) variance–covariance matrix of theta_.

Computes the HC0 sandwich estimator

\[V_{\text{sand}} = B M B, \quad B = \mathrm{vcov}(\mathrm{regularize}), \quad M = \sum_i s_i s_i^\top,\]

where \(B\) is the bread — the inverse observed information computed by vcov() — and \(s_i\) is the per-observation score (row \(i\) of estfun()).

The regularize parameter is forwarded to vcov(), so the bread inherits the same penalty-augmented Hessian recovery as vcov(regularize='active') (the default).

Parameters:

regularize (str | None) – Passed directly to vcov(). See that method’s documentation for details.

Returns:

Symmetric (p+q, p+q) matrix.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
ValueError – If regularize is not 'active' or None.
RuntimeError – If the Hessian is singular and regularize=None.

scaling_feature_names_in_: list[str] | None#

score(y, X=None, weights=None, offset=None)[source]#

Log-likelihood at the fitted parameters (sklearn-compatible).

Higher is better; this is NOT the negative log-likelihood.

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]] | CensoredData) – Response observations.
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional covariate matrix.
weights (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional per-observation weights.
offset (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional per-observation offset added to h.

Return type:

float

Raises:

NotFittedError – If called before fit().

score_contributions()[source]#

Alias for estfun(). See that method for details.

Return type:: ndarray[tuple[Any, ...], dtype[double]]

simulate(n, X=None, random_state=None, X_scale=None)[source]#

Draw samples from the fitted model via the quantile transformation.

Samples u ~ Uniform(0, 1) and returns predict(u, X, X_scale_new=X_scale, what="quantile").

Parameters:

n (int) – Number of samples to draw.
X (ndarray[tuple[Any, ...], dtype[double]] | None) – Covariate matrix of shape (n, q). Each row yields one conditional draw; must be supplied when the model was fitted with covariates. Pass None only for covariate-free fits.
random_state (int | Generator | None) – Seed or numpy.random.Generator for reproducibility.
X_scale (ndarray[tuple[Any, ...], dtype[double]] | None) – Scaling-design matrix of shape (n, q_s). Required when the model was fitted with scaling=; ignored otherwise. Each row yields one heteroskedastic conditional draw via q_i = h_0⁻¹((Φ⁻¹(u_i) − x_d,i·β) / exp(0.5·x_s,i·γ)).

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().
ValueError – If X is provided but its number of rows does not equal n, or if X_scale shape is inconsistent with the fit.

standard_errors(regularize='active')[source]#

Vector of asymptotic standard errors for theta_.

Computed as sqrt(diag(vcov(regularize=regularize))). Length equals len(theta_).

Parameters:

regularize (str | None) – Passed directly to vcov(). See that method’s documentation for details.

Raises:

NotFittedError – If called before fit().
RuntimeError – Propagated from vcov() if the Hessian is singular and regularize=None.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

theta_: ndarray[tuple[Any, ...], dtype[float64]] | None#: Fitted parameter vector [theta_basis | beta]. None before fit().

vcov(regularize='active')[source]#

Asymptotic variance–covariance matrix of theta_.

Returns the inverse of the observed information matrix hessian_ (Hessian of the negative log-likelihood at the MLE). Under standard regularity conditions, this is a consistent estimator of the asymptotic covariance of the maximum-likelihood estimator.

Parameters:

regularize (str | None) –

Regularization strategy for near-singular Hessians.

'active' — if direct inversion fails, recover a finite covariance via the active-set-constrained form: the top-left block of the inverse of the bordered KKT matrix [[H, A_aᵀ], [A_a, 0]], where A_a is the sub-matrix of rows of _A_ineq_ whose KKT multiplier exceeds _ACTIVE_CONSTRAINT_TOL (see _constrained_vcov_active()). When auglag data are not available (SLSQP / trust-constr fits) the pseudoinverse is used as a fallback. This is the default because the Hessian can be singular at constrained MLEs, and on well-conditioned fits it reduces to bare H⁻¹ (R mlt::vcov.mlt behaves the same way in the cases where mltpy’s bare inv(H) already matches R — see tests/test_confidence.py).
'auglag' — always return the active-set-constrained covariance when active monotonicity rows exist, rather than waiting for bare inversion to fail. This is the ρ→∞ limit of the penalty form (H + ρ·A_aᵀA_a)⁻¹ and mirrors R mlt::vcov.mlt on the constrained branches that bare inv(H) misses (notably the scaled-baseline Coxph path, where bare inv(H) diverges from R’s vcov(as.mlt(fit)) by ~37× on the binding rows while the constrained form matches at rtol≈1e-4). Unlike the earlier penalty implementation it does not depend on the optimiser’s final penalty ρ (which the augmented-Lagrangian now freezes once feasible). Falls back to bare H when no constraint binds or auglag data are unavailable. Opt-in because it inflates standard errors along tied rows and consequently widens confint / confband outputs in cases where mltpy’s bare inv(H) already matches R.
None — raise RuntimeError on singular Hessian (original behaviour; useful when you need a diagnostic failure).

Returns:

Symmetric (p+q, p+q) matrix.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

ValueError – If regularize is not 'active', 'auglag', or None.
NotFittedError – If called before fit().
RuntimeError – If the Hessian is singular and regularize=None, or if hessian_ is unexpectedly missing after fitting.

wald_test(R, r=None, vcov='information', regularize='active')[source]#

Wald test for linear restrictions Rθ = r.

Computes the chi-squared Wald statistic

\[W = (R\hat\theta - r)^\top \bigl[R\,V\,R^\top\bigr]^{-1} (R\hat\theta - r) \;\sim\; \chi^2(k),\]

where \(k\) is the number of rows in \(R\) and \(V\) is either the inverse-information vcov() or the sandwich estimator sandwich_vcov().

Parameters:

R (ndarray[tuple[Any, ...], dtype[double]]) – Contrast matrix of shape (k, p+q). Each row encodes one linear restriction on theta_.
r (ndarray[tuple[Any, ...], dtype[double]] | None) – Null-hypothesis value vector of length k. Defaults to the zero vector (i.e. Rθ = 0).
vcov (Literal['information', 'sandwich']) – Which variance–covariance matrix to use. "information" (the default) uses the observed Fisher information vcov(); "sandwich" uses the HC0 sandwich estimator sandwich_vcov().
regularize (str | None) – Passed directly to vcov() (or sandwich_vcov()). See vcov() for the accepted values and their effect. Default "active" applies penalty-augmented Hessian recovery when inversion fails.

Returns:

Dataclass with fields statistic, df, p_value, and vcov_type.

Return type:

WaldTestResult

Raises:

NotFittedError – If called before fit().
ValueError – If R does not have len(theta_) columns or r has the wrong length.
RuntimeError – If R V R^T is singular (the restriction is degenerate or collinear).

weights_: ndarray[tuple[Any, ...], dtype[float64]] | None#: Observation weights supplied to the last fit() call. None when no weights were used.

exception mltpy.model.ConvergenceWarning[source]#

Bases: UserWarning

Raised when the optimiser fails to converge within the allowed restarts.

class mltpy.model.MLT(order=6, support=(0.0, 1.0), censoring=CensoringType.NONE, optimizer_config=None, base_distribution='normal', scaling=None)[source]#

Bases: ConditionalTransformationModel

Most Likely Transformation — convenience interface.

A ConditionalTransformationModel with an explicit order and support parameter instead of a pre-built BernsteinBasis.

Parameters:

order (int) – Polynomial degree of the Bernstein basis. Defaults to 6.
support (tuple[float, float]) – Closed interval (a, b) with a < b. Defaults to (0, 1).
censoring (CensoringType) – Censoring type of the response data.
optimizer_config (OptimizerConfig | None) – Optimisation settings.
base_distribution (BaseDistribution)
scaling (NDArray[np.float64] | None)

Examples

>>> model = MLT(order=6, support=(0, 100))
>>> model.fit(y)
>>> cdf = model.predict(y_new, what="distribution")

Parameters:

base_distribution (Literal['normal', 'logistic', 'min_extreme_value', 'max_extreme_value', 'exponential', 'laplace', 'cauchy'])
scaling (ndarray[tuple[Any, ...], dtype[double]] | None)
order (int)
support (tuple[float, float])
censoring (CensoringType)
optimizer_config (OptimizerConfig | None)

exception mltpy.model.NotFittedError[source]#

Bases: ValueError

Raised when a method that requires a fitted model is called before fit().

class mltpy.model.WaldTestResult(statistic, df, p_value, vcov_type)[source]#

Bases: object

Result of a Wald test for linear restrictions on model parameters.

Parameters:

statistic (float) – Wald chi-squared statistic W = (Rθ - r)^T [R V R^T]^{-1} (Rθ - r).
df (int) – Degrees of freedom (number of restrictions, i.e. number of rows in R).
p_value (float) – Right-tail probability Pr(χ²(df) > W).
vcov_type (str) – Which variance–covariance matrix was used: "information" or "sandwich".

df: int#

p_value: float#

statistic: float#

vcov_type: str#

mltpy.model.anova(*models)[source]#

Likelihood-ratio test for a sequence of nested transformation models.

Models are sorted internally by their number of free parameters (n_free_params_) in ascending order, and pairwise LR statistics are computed against the immediately smaller model. The user is responsible for ensuring the models are actually nested (fitted on the same data with the smaller’s parameter space contained in the larger’s). Sample size is checked; structural nesting is not.

Parameters:: *models (ConditionalTransformationModel) – Two or more fitted ConditionalTransformationModel instances.
Returns:: See AnovaResult for the column layout.
Return type:: AnovaResult
Raises:: ValueError – If fewer than two models are passed; if any model is not fitted; if the models were fitted on different sample sizes; or if two consecutive models (after sorting) have the same number of free parameters (cannot be nested).

Notes

The test statistic is D = 2·(loglik_full − loglik_reduced), which is asymptotically χ²_df with df = k_full − k_reduced under the null hypothesis that the reduced model is correct. Mirrors R’s anova.mlt.

Examples

>>> small = MLT(order=3, support=(0, 1)).fit(y)
>>> large = MLT(order=6, support=(0, 1)).fit(y)
>>> print(anova(small, large))