model#
Public API for conditional transformation models.
Users import exclusively from this module:
import mltpy
model = mltpy.MLT(order=6, support=(0, 100))
model.fit(y)
cdf = model.predict(y_new, what="distribution")
Classes#
- ConditionalTransformationModel
Base class for all transformation models.
- MLT
Most Likely Transformation — convenience subclass with sensible defaults.
- class mltpy.model.AnovaResult(model_names, n_params, log_lik, df, deviance, p_value)[source]#
Bases:
objectResult of a likelihood-ratio test comparing nested models.
Models are sorted by
n_paramsascending (reduced → full). For each model after the first, the entry at the same index gives the LR statistic comparing it to the previous model in the sequence.- Parameters:
model_names (
tuple[str,...]) – Display names of the compared models, in the same order as the rows.n_params (
tuple[int,...]) – Number of free parameters per model.log_lik (
tuple[float,...]) – Maximised log-likelihood per model.df (
tuple[int|None,...]) – Degrees of freedom for each pairwise test (Nonefor the first row, which has no predecessor).deviance (
tuple[float|None,...]) – Likelihood-ratio statisticD = 2·(loglik_full − loglik_reduced)for each pairwise test (Nonefor the first row).p_value (
tuple[float|None,...]) – Right-tail probability of the chi-squared distribution with the corresponding degrees of freedom (Nonefor the first row).
- class mltpy.model.ConditionalTransformationModel(basis, censoring=CensoringType.NONE, optimizer_config=None, base_distribution='normal', scaling=None)[source]#
Bases:
objectBase class for conditional transformation models.
Fits a monotone transformation h(y|x) parametrised as a Bernstein polynomial such that h(y|x) follows a standard distribution.
- Parameters:
basis (
BernsteinBasis|InteractionBasis) –BernsteinBasisdefining the response transformation.censoring (
CensoringType|None) – Censoring type of the response data. Defaults toNONE.optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, defaults fromOptimizerConfigare used.base_distribution (
Literal['normal','logistic','min_extreme_value','max_extreme_value','exponential','laplace','cauchy'])
- property Theta_: ndarray[tuple[Any, ...], dtype[float64]] | None#
Coefficient matrix
Θof shape(p, q)for interaction models.Nonebeforefit()or for non-interaction models.theta_[i*q + j] = Θ[i, j](row-major layout).
- aic()[source]#
Akaike Information Criterion of the fitted model.
- Returns:
AIC = -2 · loglik + 2 · kwherekis the number of free parameters (n_free_params_) andloglikis the maximised log-likelihood.- Return type:
- Raises:
NotFittedError – If called before
fit().
Notes
Lower is better. The monotonicity inequality
D @ theta_b >= 0is not counted as a binding equality constraint, sokequals the full length oftheta_— matching Rmlt::AIC.mlt, which useslength(coef(fit)).Examples
>>> model = MLT(order=4, support=(0, 1)).fit(y) >>> model.aic()
- bic()[source]#
Bayesian Information Criterion of the fitted model.
- Returns:
BIC = -2 · loglik + log(n) · kwherenis the number of observations (n_obs_),kthe number of free parameters (n_free_params_), andloglikthe maximised log-likelihood.- Return type:
- Raises:
NotFittedError – If called before
fit().
Notes
Lower is better. Penalises additional parameters more heavily than
aic()forn > 7. Matches Rmlt::BIC.mltwhich useslength(coef(fit))fork.Examples
>>> model = MLT(order=4, support=(0, 1)).fit(y) >>> model.bic()
- confband(y_grid, X=None, level=0.95, what='distribution', offset=None)[source]#
Pointwise delta-method confidence band for a predicted curve.
For each grid point
y_i(with an optional covariate profilex), compute a “linear-predictor” scaleη_itogether with its asymptotic variance via the delta method\[\eta_i = g(y_i, x;\,\theta),\qquad \mathrm{Var}(\eta_i) = J_i\,V\,J_i^\top,\quad J_i = \partial\eta_i/\partial\theta,\]form the Wald interval
η_i ± z · sqrt(Var(η_i)), and back-transform the endpoints to the requestedwhatscale. The intervals are pointwise, not simultaneous.The linear predictor and back-transform depend on
what:"trafo"—η = h; back-transform = identity"distribution"—η = h; back-transform =F_base(·)"survivor"—η = h; back-transform =1 − F_base(·)(endpoints swapped, since1 − Fis decreasing)"density"—η = log f(h) + log h'; back-transform =exp(·)"hazard"—η = log f(h) + log h' − log S(h); back-transform =exp(·)
- Parameters:
y_grid (
ndarray[tuple[Any,...],dtype[double]]) – Response values at which to evaluate the band. Must lie withinbasis.support.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Covariate profile for a single curve. Accepts a 1D array of lengthqor a 2D(1, q)array; broadcast acrossy_grid. Required when the model was fit with covariates; must beNonewhen it was not.level (
float) – Confidence level in(0, 1). Defaults to0.95.what (
Literal['trafo','distribution','survivor','density','hazard']) – One of"trafo","distribution","survivor","density","hazard". Defaults to"distribution".offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-grid-point offset of shape(len(y_grid),). Added tohbefore computing the band; does not affect the delta-method Jacobian (offset is constant w.r.t.theta).
- Returns:
Array of shape
(len(y_grid), 3)with columns[estimate, lower, upper]on thewhatscale.- Return type:
- Raises:
NotFittedError – If called before
fit().ValueError – If
levelis outside(0, 1),whatis not supported, or the shape/presence ofXis inconsistent with the fitted model.RuntimeError – Propagated from
vcov()on singular Hessians, or if the fitted basis violates monotonicity at a grid point (h'(y) ≤ 0), which would make thedensity/hazardlinear predictor ill-defined.
Notes
Working on the transformation scale before back-transforming keeps probability bands in
[0, 1]and density/hazard bands positive. The reference R routinemlt::confbandbuilds simultaneous bands via multivariate-normal quantiles; this implementation is pointwise to match the Wald construction used in most applied survival plots.Examples
>>> model = Coxph(support=(0.01, t.max())).fit(cd, X=X) >>> grid = np.linspace(0.1, t.max(), 100) >>> band = model.confband(grid, X=X[:1], what="survivor") >>> ax.fill_between(grid, band[:, 1], band[:, 2], alpha=0.2) >>> ax.plot(grid, band[:, 0])
- confint(level=0.95, parm=None, type='wald')[source]#
Confidence intervals for
theta_.Two interval types are supported:
type="wald"(default) — symmetric normal-approximation interval\[\hat\theta_j \pm z_{1-\alpha/2}\,\sqrt{V_{jj}},\]where \(V = \mathrm{vcov}()\) is the inverse observed information matrix and \(z_{1-\alpha/2}\) is the standard normal quantile for confidence
level\(= 1-\alpha\). Matches Rconfint.default(mlt_fit, level=level).type="profile"— profile-likelihood interval obtained by inverting the \(\chi^2_1\) likelihood-ratio test. For each requested parameter index \(j\) we solve\[2\,(\hat\ell - \ell_p(v)) = \chi^2_{1,1-\alpha},\]where \(\ell_p(v)\) is the maximised log-likelihood with \(\theta_j\) pinned to \(v\) and the remaining parameters re-optimised under the model constraints. Each parameter costs roughly ten constrained refits, so always pass
parm=to restrict the work on larger models.Robustness (issue #89): three inner-fit failure modes can occur per parameter — (i) the adaptive bracket fails to span a sign change, (ii) the pinned refit lands on a degenerate monotonicity active set so the equality
theta[j] = vcannot be honoured (“boundary”), or (iii) the pinned refit does not converge to tolerance (“convergence”, KKT residual ≥_PROFILE_INNER_KKT_THRESHOLD). Whenparm is None(you asked for every parameter) each failure emits aConvergenceWarningnaming the parameter and writes±np.inf(bracket / boundary) ornp.nan(convergence) to that row, so one un-identified parameter does not abort the whole call. Whenparmis an explicit sequence (you asked for those parameters specifically) the same failures re-raise asRuntimeErrorso you can debug the request.
- Parameters:
level (
float) – Confidence level in(0, 1). Defaults to0.95.parm (
Sequence[int] |None) – Optional sequence of integer indices selecting a subset of parameters.Nonereturns intervals for all entries oftheta_.type (
Literal['wald','profile']) – Interval type."wald"(default) preserves the existing normal-approximation behaviour;"profile"returns the likelihood-ratio interval.
- Returns:
Array of shape
(k, 2)with columns[lower, upper];kequalslen(theta_)whenparm is Noneelselen(parm). Row order matches the requested index order.- Return type:
- Raises:
NotFittedError – If called before
fit().ValueError – If
levelis outside(0, 1),parmcontains indices outside[0, len(theta_)), ortypeis not one of{"wald", "profile"}.RuntimeError – Propagated from
vcov()on singular Hessians (Wald), or from the profile-CI bracket search / inner-fit failure when an explicitparmwas provided (profile). Underparm=Nonethe same failures becomeConvergenceWarninginstead.
Examples
>>> model = MLT(order=4, support=(0, 1)).fit(y) >>> ci = model.confint(level=0.95) # shape (p, 2) >>> ci_prof = model.confint(level=0.95, parm=[0], type="profile")
- estfun()[source]#
Per-observation score contributions,
(n, p+q).Equivalent to R’s
sandwich::estfun(mlt_fit): rowiis∂ℓ_i/∂θevaluated attheta_. At the MLE the column sums are zero up to optimiser tolerance.- Returns:
Matrix of shape
(n_obs_, p+q). Computed eagerly infit()and cached; subsequent mutations of the originaly/Xcannot affect the returned matrix.- Return type:
- Raises:
NotFittedError – If called before
fit().RuntimeError – If the cached score matrix is unexpectedly missing after fitting (e.g. a prior
fit()call failed partway through).
- feature_names_in_: list[str] | None#
Names of the covariate columns supplied to
fit(), if any. Populated from apandas.DataFramecolumn index when available, otherwise["X1", "X2", ...].Nonewhen the model was fit without covariates.
- fit(y, X=None, weights=None, offset=None)[source]#
Fit the transformation model by maximum likelihood.
- Parameters:
y (
ndarray[tuple[Any,...],dtype[double]] |CensoredData) – Response observations. Must lie withinbasis.support. Acceptsnp.ndarray,pd.Series, orCensoredData.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix of shape(n, q). If given, the lastqentries oftheta_are regression coefficients.weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional non-negative per-observation weights of shape(n,). The weighted log-likelihoodΣ w_i · ℓ_iis maximised; no normalisation is applied.Noneis equivalent to all-ones.offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation offset of shape(n,). Added toh(y|x)before distribution calls on every likelihood evaluation:h_eff = B·θ_b + X·β + offset.Noneis equivalent to all-zeros.
- Returns:
Returns itself for method chaining:
cdf = model.fit(y).predict(y, what="distribution")
- Return type:
- Raises:
ValueError – If
ycontains values outsidebasis.support, or ifweights/offsethave the wrong shape or invalid values.
- property gamma_coef_: ndarray[tuple[Any, ...], dtype[float64]] | None#
Scaling-block coefficients
γ(lengthq_s).Nonebeforefit()or when the model was constructed withoutscaling=. Sign-aligned with Rtram::*(scale=...)’s scaling block (no flip needed for parity comparisons; seedocs/adr/0002-scaling-terms.md, Decision 5).
- hessian_: ndarray[tuple[Any, ...], dtype[float64]] | None#
Observed information matrix — analytical Hessian of the negative log-likelihood evaluated at
theta_. Shape(p+q, p+q). Computed eagerly at the end offit().Nonebeforefit().
- n_free_params_: int | None#
Number of free parameters in the fitted model — equal to
len(theta_)(Bernstein coefficients plus optional regression coefficients). The monotonicity constraintD @ theta_b >= 0is an inequality and does not reduce the parameter count.Nonebeforefit().
- n_obs_: int | None#
Number of observations used in
fit(). ForCensoredData, this isy.n; otherwiselen(y).Nonebeforefit().
- offset_: ndarray[tuple[Any, ...], dtype[float64]] | None#
Per-observation offset supplied to the last
fit()call.Nonewhen no offset was used.
- plot(y, X=None, ax=None)[source]#
Plot the estimated CDF and density.
For non-interacting models, draws a single CDF/density curve over
y(covariates are ignored — the unconditional baseline is shown). ForInteractionBasismodels, draws one CDF curve and one density curve per row ofXon a shared y-axis.- Parameters:
y (
ndarray[tuple[Any,...],dtype[double]]) – Response values at which to evaluate the model. Must lie withinbasis.support.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Required forInteractionBasismodels: a 2-D matrix whose rows are the representative covariate values at which to draw the conditional curves. Ignored for non-interacting models.ax (
object) – Optional 2-tuple(ax_cdf, ax_pdf)ofmatplotlib.axes.Axes, or a singlematplotlib.axes.Axesinstance. If a single axes is given, only the CDF is plotted. IfNone, a new figure with two subplots is created automatically.
- Returns:
[ax_cdf, ax_pdf]if two panels are plotted, otherwise the singleax_cdf.- Return type:
- Raises:
NotFittedError – If called before
fit().ImportError – If matplotlib is not installed.
ValueError – If
Xis not provided for an interacting model, or if it cannot be interpreted as a 2-D array.TypeError – If
axis provided but cannot be unpacked into two axes nor used as a single axes.
- predict(y_new, X_new=None, what='distribution', offset_new=None, X_scale_new=None)[source]#
Compute model predictions at new observations.
- Parameters:
y_new (
ndarray[tuple[Any,...],dtype[double]]) – Forwhat="quantile": probabilities in(0, 1). For all otherwhat: response values inbasis.support.X_new (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix of shape(m, q).offset_new (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation offset of shape(m,). Added toh(y|x)before distribution calls.X_scale_new (
ndarray[tuple[Any,...],dtype[double]] |None) – New-data scaling-design matrix of shape(m, q_s), required when the model was fitted withscaling=. Enters viah(y|x_d, x_s) = h_0(y) · exp(0.5 · x_s · γ) + x_d · β— same parameterisation asfit(). PassNonefor non-scaling fits.what (
Literal['trafo','distribution','logdistribution','survivor','logsurvivor','density','logdensity','hazard','loghazard','cumhazard','logcumhazard','odds','logodds','quantile']) –Type of prediction. Let
h = h(y|x)andh' = ∂h/∂y;F,S,fdenote the base distribution’s CDF, survivor, and PDF."trafo"— Transformationh(y|x)"distribution"— CDF:F(h)"logdistribution"—log F(h)"survivor"— Survivor:S(h) = 1 − F(h)"logsurvivor"—log S(h)"density"— PDF:f(h) · h'"logdensity"—log f(h) + log h'"hazard"— Hazard:f(h) · h' / S(h)"loghazard"—log f(h) + log h' − log S(h)"cumhazard"— Cumulative hazard:−log S(h)"logcumhazard"—log(−log S(h))"odds"—F(h) / S(h)"logodds"—log F(h) − log S(h)"quantile"— Quantile via inversion; right-censored models use an R-compatible grid+spline inversion.
- Return type:
- Raises:
NotFittedError – If called before
fit().ValueError – If
whatis not one of the valid options.
Notes
Log-scale variants use
scipy.special.log_ndtrfor the normal distribution’s log-CDF (more accurate in the tails) anddist.logcdf/logsf/logpdfotherwise.Examples
>>> model = MLT(order=4, support=(0, 1)).fit(y) >>> cdf = model.predict(y_new, what="distribution") >>> q50 = model.predict(np.array([0.5]), what="quantile")
- residuals(type='score')[source]#
Per-observation residuals for model diagnostics.
Computed at the training data passed to
fit(). Mirrors Rmlt::residualsfortype="score"; the Cox-Snell and deviance forms are derived from the fitted survivor function.- Parameters:
type (
Literal['score','cox-snell','deviance']) –Which residual to compute.
"score"(default) — score residual w.r.t. an artificial intercept added toh(y|x): for exact-ψ(h_i); for right-censoredf(h)/S(h); for left-censored-f(h)/F(h); for interval-(f(h_b) - f(h_a)) / (F(h_b) - F(h_a)). Sign matches Rmlt::residuals(the negative of the positive-log-likelihood score). At the MLE the sum is zero up to optimiser tolerance."cox-snell"—r_i = -log S(y_i|x_i). Under a correctly specified model these are approximatelyExp(1). For censored observationsy_iis the censoring threshold (lowerfor right-censored,upperfor left-censored, the midpoint for interval-censored); the resulting residuals for those observations are themselves censoredExp(1)variates."deviance"—sign(r_i - 1) · sqrt(2·|r_i - log(r_i) - 1|)wherer_iis the Cox-Snell residual. Under a correctly specified model these are approximately standard normal.
- Returns:
Vector of length
n_obs_.- Return type:
- Raises:
NotFittedError – If called before
fit().ValueError – If
typeis not one of the supported residual kinds.
Notes
type="score"matches Rmlt::residuals(mlt_fit)exactly, element-wise tortol=1e-6.type="cox-snell"uses the same evaluation point convention as R’s-log(predict(mlt_fit, type = "survivor")).r_iis clipped atnp.finfo(float).tinybeforelog(r_i)in the deviance formula to avoid-inf.
- result_: OptimizationResult | None#
Full result object from the last
fit()call.Nonebeforefit().
- sandwich_se(regularize='active')[source]#
Sandwich (robust) standard errors for
theta_.Computed as
sqrt(diag(sandwich_vcov(regularize=regularize))).- Parameters:
regularize (
str|None) – Passed directly tosandwich_vcov(). Seevcov()for details.- Returns:
Vector of length
len(theta_).- Return type:
- Raises:
NotFittedError – If called before
fit().RuntimeError – Propagated from
sandwich_vcov()on singular Hessians, or if the sandwich variance matrix has negative diagonal entries.
- sandwich_vcov(regularize='active')[source]#
Sandwich (robust) variance–covariance matrix of
theta_.Computes the HC0 sandwich estimator
\[V_{\text{sand}} = B M B, \quad B = \mathrm{vcov}(\mathrm{regularize}), \quad M = \sum_i s_i s_i^\top,\]where \(B\) is the bread — the inverse observed information computed by
vcov()— and \(s_i\) is the per-observation score (row \(i\) ofestfun()).The
regularizeparameter is forwarded tovcov(), so the bread inherits the same penalty-augmented Hessian recovery asvcov(regularize='active')(the default).- Parameters:
regularize (
str|None) – Passed directly tovcov(). See that method’s documentation for details.- Returns:
Symmetric
(p+q, p+q)matrix.- Return type:
- Raises:
NotFittedError – If called before
fit().ValueError – If regularize is not
'active'orNone.RuntimeError – If the Hessian is singular and
regularize=None.
- score(y, X=None, weights=None, offset=None)[source]#
Log-likelihood at the fitted parameters (sklearn-compatible).
Higher is better; this is NOT the negative log-likelihood.
- Parameters:
y (
ndarray[tuple[Any,...],dtype[double]] |CensoredData) – Response observations.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix.weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation weights.offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation offset added toh.
- Return type:
- Raises:
NotFittedError – If called before
fit().
- simulate(n, X=None, random_state=None, X_scale=None)[source]#
Draw samples from the fitted model via the quantile transformation.
Samples
u ~ Uniform(0, 1)and returnspredict(u, X, X_scale_new=X_scale, what="quantile").- Parameters:
n (
int) – Number of samples to draw.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Covariate matrix of shape(n, q). Each row yields one conditional draw; must be supplied when the model was fitted with covariates. PassNoneonly for covariate-free fits.random_state (
int|Generator|None) – Seed ornumpy.random.Generatorfor reproducibility.X_scale (
ndarray[tuple[Any,...],dtype[double]] |None) – Scaling-design matrix of shape(n, q_s). Required when the model was fitted withscaling=; ignored otherwise. Each row yields one heteroskedastic conditional draw viaq_i = h_0⁻¹((Φ⁻¹(u_i) − x_d,i·β) / exp(0.5·x_s,i·γ)).
- Return type:
- Raises:
NotFittedError – If called before
fit().ValueError – If
Xis provided but its number of rows does not equaln, or ifX_scaleshape is inconsistent with the fit.
- standard_errors(regularize='active')[source]#
Vector of asymptotic standard errors for
theta_.Computed as
sqrt(diag(vcov(regularize=regularize))). Length equalslen(theta_).
- theta_: ndarray[tuple[Any, ...], dtype[float64]] | None#
Fitted parameter vector
[theta_basis | beta].Nonebeforefit().
- vcov(regularize='active')[source]#
Asymptotic variance–covariance matrix of
theta_.Returns the inverse of the observed information matrix
hessian_(Hessian of the negative log-likelihood at the MLE). Under standard regularity conditions, this is a consistent estimator of the asymptotic covariance of the maximum-likelihood estimator.- Parameters:
Regularization strategy for near-singular Hessians.
'active'— if direct inversion fails, recover a finite covariance via the active-set-constrained form: the top-left block of the inverse of the bordered KKT matrix[[H, A_aᵀ], [A_a, 0]], whereA_ais the sub-matrix of rows of_A_ineq_whose KKT multiplier exceeds_ACTIVE_CONSTRAINT_TOL(see_constrained_vcov_active()). When auglag data are not available (SLSQP / trust-constr fits) the pseudoinverse is used as a fallback. This is the default because the Hessian can be singular at constrained MLEs, and on well-conditioned fits it reduces to bareH⁻¹(Rmlt::vcov.mltbehaves the same way in the cases where mltpy’s bareinv(H)already matches R — seetests/test_confidence.py).'auglag'— always return the active-set-constrained covariance when active monotonicity rows exist, rather than waiting for bare inversion to fail. This is the ρ→∞ limit of the penalty form(H + ρ·A_aᵀA_a)⁻¹and mirrors Rmlt::vcov.mlton the constrained branches that bareinv(H)misses (notably the scaled-baseline Coxph path, where bareinv(H)diverges from R’svcov(as.mlt(fit))by ~37× on the binding rows while the constrained form matches atrtol≈1e-4). Unlike the earlier penalty implementation it does not depend on the optimiser’s final penaltyρ(which the augmented-Lagrangian now freezes once feasible). Falls back to bareHwhen no constraint binds or auglag data are unavailable. Opt-in because it inflates standard errors along tied rows and consequently widensconfint/confbandoutputs in cases where mltpy’s bareinv(H)already matches R.None— raiseRuntimeErroron singular Hessian (original behaviour; useful when you need a diagnostic failure).
- Returns:
Symmetric
(p+q, p+q)matrix.- Return type:
- Raises:
ValueError – If regularize is not
'active','auglag', orNone.NotFittedError – If called before
fit().RuntimeError – If the Hessian is singular and
regularize=None, or ifhessian_is unexpectedly missing after fitting.
- wald_test(R, r=None, vcov='information', regularize='active')[source]#
Wald test for linear restrictions
Rθ = r.Computes the chi-squared Wald statistic
\[W = (R\hat\theta - r)^\top \bigl[R\,V\,R^\top\bigr]^{-1} (R\hat\theta - r) \;\sim\; \chi^2(k),\]where \(k\) is the number of rows in \(R\) and \(V\) is either the inverse-information
vcov()or the sandwich estimatorsandwich_vcov().- Parameters:
R (
ndarray[tuple[Any,...],dtype[double]]) – Contrast matrix of shape(k, p+q). Each row encodes one linear restriction ontheta_.r (
ndarray[tuple[Any,...],dtype[double]] |None) – Null-hypothesis value vector of lengthk. Defaults to the zero vector (i.e.Rθ = 0).vcov (
Literal['information','sandwich']) – Which variance–covariance matrix to use."information"(the default) uses the observed Fisher informationvcov();"sandwich"uses the HC0 sandwich estimatorsandwich_vcov().regularize (
str|None) – Passed directly tovcov()(orsandwich_vcov()). Seevcov()for the accepted values and their effect. Default"active"applies penalty-augmented Hessian recovery when inversion fails.
- Returns:
Dataclass with fields
statistic,df,p_value, andvcov_type.- Return type:
- Raises:
NotFittedError – If called before
fit().ValueError – If
Rdoes not havelen(theta_)columns orrhas the wrong length.RuntimeError – If
R V R^Tis singular (the restriction is degenerate or collinear).
- exception mltpy.model.ConvergenceWarning[source]#
Bases:
UserWarningRaised when the optimiser fails to converge within the allowed restarts.
- class mltpy.model.MLT(order=6, support=(0.0, 1.0), censoring=CensoringType.NONE, optimizer_config=None, base_distribution='normal', scaling=None)[source]#
Bases:
ConditionalTransformationModelMost Likely Transformation — convenience interface.
A
ConditionalTransformationModelwith an explicitorderandsupportparameter instead of a pre-builtBernsteinBasis.- Parameters:
order (
int) – Polynomial degree of the Bernstein basis. Defaults to 6.support (
tuple[float,float]) – Closed interval(a, b)witha < b. Defaults to(0, 1).censoring (
CensoringType) – Censoring type of the response data.optimizer_config (
OptimizerConfig|None) – Optimisation settings.base_distribution (BaseDistribution)
scaling (NDArray[np.float64] | None)
Examples
>>> model = MLT(order=6, support=(0, 100)) >>> model.fit(y) >>> cdf = model.predict(y_new, what="distribution")
- Parameters:
- exception mltpy.model.NotFittedError[source]#
Bases:
ValueErrorRaised when a method that requires a fitted model is called before fit().
- class mltpy.model.WaldTestResult(statistic, df, p_value, vcov_type)[source]#
Bases:
objectResult of a Wald test for linear restrictions on model parameters.
- Parameters:
statistic (
float) – Wald chi-squared statisticW = (Rθ - r)^T [R V R^T]^{-1} (Rθ - r).df (
int) – Degrees of freedom (number of restrictions, i.e. number of rows inR).p_value (
float) – Right-tail probabilityPr(χ²(df) > W).vcov_type (
str) – Which variance–covariance matrix was used:"information"or"sandwich".
- mltpy.model.anova(*models)[source]#
Likelihood-ratio test for a sequence of nested transformation models.
Models are sorted internally by their number of free parameters (
n_free_params_) in ascending order, and pairwise LR statistics are computed against the immediately smaller model. The user is responsible for ensuring the models are actually nested (fitted on the same data with the smaller’s parameter space contained in the larger’s). Sample size is checked; structural nesting is not.- Parameters:
*models (
ConditionalTransformationModel) – Two or more fittedConditionalTransformationModelinstances.- Returns:
See
AnovaResultfor the column layout.- Return type:
- Raises:
ValueError – If fewer than two models are passed; if any model is not fitted; if the models were fitted on different sample sizes; or if two consecutive models (after sorting) have the same number of free parameters (cannot be nested).
Notes
The test statistic is
D = 2·(loglik_full − loglik_reduced), which is asymptoticallyχ²_dfwithdf = k_full − k_reducedunder the null hypothesis that the reduced model is correct. Mirrors R’sanova.mlt.Examples
>>> small = MLT(order=3, support=(0, 1)).fit(y) >>> large = MLT(order=6, support=(0, 1)).fit(y) >>> print(anova(small, large))