tram#
Convenience layer for common conditional transformation models.
This module provides pre-configured wrappers around
ConditionalTransformationModel / MLT
that mirror the R tram package (Hothorn). Users working with these classes
never need to import BernsteinBasis,
CensoringType, or
OptimizerConfig directly.
Classes#
- BoxCox
Box-Cox transformation model for continuous outcomes with exact observations.
- Coxph
Cox proportional hazards model for right-censored survival data.
- Colr
Continuous outcome logistic regression — uses a logistic base distribution.
- Lm
Normal linear regression as a CTM (order=1 Bernstein, normal base).
- class mltpy.tram.BoxCox(support, order=6, optimizer_config=None, censoring=CensoringType.NONE, scaling=None)[source]#
Bases:
_TramModelBox-Cox transformation model for continuous outcomes.
Fits a flexible, monotone transformation h(y) that maps the response distribution to a standard normal. Useful as a non-parametric generalisation of the classical Box-Cox power transform when the normality assumption for linear regression is violated.
- Parameters:
support (
tuple[float,float]) – Closed interval(a, b)covering all observed values.order (
int) – Polynomial degree of the Bernstein basis. Defaults to 6.optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, library defaults are used.censoring (
CensoringType) – Censoring type of the response data. Defaults toNONE. PassRIGHT,LEFT, orINTERVALtogether with aCensoredDatayto fit the censored Box-Cox likelihood.scaling (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional scaling-design matrix of shape(n, q_s)mirroring Rtram::BoxCox(..., scale=~x_s). Threads through to the scaled-baseline likelihood (issue #71) and the scaled-predict path (issue #72). When supplied, the fitted parameter vector gains a γ block (lengthq_s) exposed asgamma_, andpredict()requiresX_scale_new. Sign-aligned with the Rscale=block (ADR 0002, Decision 5); seedocs/adr/0002-scaling-terms.md.
Examples
>>> from mltpy.tram import BoxCox >>> import numpy as np >>> rng = np.random.default_rng(0) >>> y = rng.lognormal(size=200) >>> model = BoxCox(support=(y.min(), y.max())) >>> model.fit(y) >>> cdf = model.predict(y, what="distribution") >>> trafo = model.fitted_transformation(y)
- property feature_names_scaling_: list[str]#
Column names of the scaling-design matrix supplied at fit time.
Populated from a
pandas.DataFramecolumn index when available, otherwise["X1", "X2", ...].- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- fitted_transformation(y)[source]#
Evaluate the raw fitted transformation h(y) = B_k(y) @ theta_b.
This is the monotone function that maps the observed response scale to the latent standard-normal scale. Useful for visualising the shape of the estimated transformation.
- property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#
Fitted scaling-block coefficients
γ(lengthq_s).Sign-aligned with R
tram::BoxCox(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- class mltpy.tram.Colr(support, order=6, optimizer_config=None, scaling=None)[source]#
Bases:
_TramModelContinuous outcome logistic regression.
Fits a monotone transformation h(y) such that h(Y|X) follows a standard logistic distribution. Analogous to ordinal logistic regression but for continuous, fully observed outcomes. Produces proportional-odds model when covariates are included.
- Parameters:
support (
tuple[float,float]) – Closed interval(a, b)covering all observed values.order (
int) – Polynomial degree of the Bernstein basis. Defaults to 6.optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, library defaults are used.scaling (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional scaling-design matrix of shape(n, q_s)mirroring Rtram::Colr(y ~ x_d | x_s). Threads through to the scaled-baseline likelihood (#71) and the scaled-predict path (#72). When supplied, the fit becomes a heteroskedastic continuous-outcome logistic regression with non-proportional log-odds — the log-odds gap between twox_svalues varies withy(the proportional-odds assumption is relaxed). The fitted parameter vector gains a γ block exposed asgamma_, andpredict()requiresX_scale_new. Sign-aligned with R (ADR 0002, Decision 5); seedocs/adr/0002-scaling-terms.md.
Examples
>>> from mltpy.tram import Colr >>> import numpy as np >>> rng = np.random.default_rng(0) >>> y = rng.logistic(loc=2.0, scale=0.5, size=200) >>> model = Colr(support=(y.min(), y.max())) >>> model.fit(y) >>> cdf = model.predict(y, what="distribution")
- property feature_names_scaling_: list[str]#
Column names of the scaling-design matrix supplied at fit time.
Populated from a
pandas.DataFramecolumn index when available, otherwise["X1", "X2", ...].- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#
Fitted scaling-block coefficients
γ(lengthq_s).Sign-aligned with R
tram::Colr(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- class mltpy.tram.Coxph(support, order=6, optimizer_config=None, interacting=None, scaling=None)[source]#
Bases:
_TramModelCox proportional hazards model for right-censored survival data.
Fits a monotone transformation h(t) under right-censoring using the minimum extreme value (
"min_extreme_value") base distribution, also known as the reversed Gumbel link. With covariates entering linearly, this parameterisation is equivalent to the classical Cox proportional hazards model. The baseline distribution is estimated non-parametrically via a Bernstein polynomial.Pass
interactingto fit a non-proportional (stratified or fully-interacting) Cox model where the transformation itself depends on the covariate via the tensor producth(t | x) = (a(t) ⊗ b(x))ᵀ vec(Θ). See ADR 0001 andInteractionBasisfor the parameter-vector layout and the column-wise monotonicity strategy.- Parameters:
support (
tuple[float,float]) – Closed interval(a, b)witha > 0andbat least as large as the longest observed follow-up time.order (
int) – Polynomial degree of the Bernstein basis on the response. Defaults to 6.optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, library defaults are used.interacting (
BernsteinBasis|OrdinalBasis|None) – Optional x-basis (BernsteinBasis,OrdinalBasis, orInterceptBasis). When provided, the model is fit asMLT(InteractionBasis(BernsteinBasis(...), interacting))instead of the standard shift model. Only exact (non-censored) time data is currently supported on this path; censoring with an interacting basis is not yet implemented in the likelihood path.scaling (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional scaling-design matrix of shape(n, q_s)mirroring Rtram::Coxph(Surv(y, event) ~ x_d | x_s). Routes through to the scaled-baseline likelihood (#71) and the scaled-predict path (#72). When supplied, the fit becomes a heteroskedastic / non-proportional-hazards Cox modellog[-log S(t | x)] = h_0(t) · exp(x_s · γ) + x_d · β: the hazard ratio between twox_svalues varies witht(the proportional-hazards assumption is relaxed). The fitted parameter vector gains a γ block exposed asgamma_, and thesurvival(),hazard(), andpredict()methods requireX_scale/X_scale_new. Sign-aligned with R (ADR 0002, Decision 5). Not supported together withinteracting=(ADR 0002, Decision 2).
Examples
>>> from mltpy.tram import Coxph >>> from mltpy.variables import CensoredData >>> import numpy as np >>> rng = np.random.default_rng(0) >>> y_time = rng.exponential(scale=2.0, size=200) >>> y_status = rng.binomial(1, 0.7, size=200).astype(bool) >>> cd = CensoredData.right_censored(y_time, censored=~y_status) >>> model = Coxph(support=(0.01, y_time.max())) >>> model.fit(cd) >>> surv = model.survival(y_time)
- property feature_names_scaling_: list[str]#
Column names of the scaling-design matrix supplied at fit time.
Populated from a
pandas.DataFramecolumn index when available, otherwise["X1", "X2", ...].- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#
Fitted scaling-block coefficients
γ(lengthq_s).Sign-aligned with R
tram::Coxph(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- hazard(y, X=None, offset=None, X_scale=None)[source]#
Estimate the hazard rate h(y) = f(y|x) / S(y|x).
- Parameters:
y (
ndarray[tuple[Any,...],dtype[double]]) – Time points withinbasis.support.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix of shape(m, q_d).offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation offset added toh.X_scale (
ndarray[tuple[Any,...],dtype[double]] |None) – New-data scaling-design matrix of shape(m, q_s), required when the model was fitted withscaling=. Threaded through topredict()asX_scale_new.
- Return type:
- Raises:
NotFittedError – If called before
fit().
- survival(y, X=None, offset=None, X_scale=None)[source]#
Estimate the survival function S(y) = 1 − F(y|x).
- Parameters:
y (
ndarray[tuple[Any,...],dtype[double]]) – Time points withinbasis.support.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix of shape(m, q_d).offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation offset added toh.X_scale (
ndarray[tuple[Any,...],dtype[double]] |None) – New-data scaling-design matrix of shape(m, q_s), required when the model was fitted withscaling=. Threaded through topredict()asX_scale_new.
- Return type:
- Raises:
NotFittedError – If called before
fit().
- class mltpy.tram.Lehmann(support, order=6, optimizer_config=None)[source]#
Bases:
_TramModelLehmann (proportional reverse-time hazards) model for right-censored data.
Dual of
Coxph. Fits a monotone transformation h(t) under right-censoring using the maximum extreme value ("max_extreme_value") base distribution — the standard Gumbel distribution. With covariates entering linearly this parameterisation satisfies-log F(t | x) = h(t) + x'β, which is the Lehmann alternative (proportional reverse-time hazards) model.- Parameters:
support (
tuple[float,float]) – Closed interval(a, b)witha > 0andbat least as large as the longest observed follow-up time.order (
int) – Polynomial degree of the Bernstein basis. Defaults to 6.optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, library defaults are used.
Examples
>>> from mltpy.tram import Lehmann >>> from mltpy.variables import CensoredData >>> import numpy as np >>> rng = np.random.default_rng(0) >>> y_time = rng.exponential(scale=2.0, size=200) >>> y_status = rng.binomial(1, 0.7, size=200).astype(bool) >>> cd = CensoredData.right_censored(y_time, censored=~y_status) >>> model = Lehmann(support=(0.01, y_time.max())) >>> model.fit(cd) >>> surv = model.survival(y_time)
- hazard(y, X=None, offset=None)[source]#
Estimate the hazard rate h(y) = f(y|x) / S(y|x).
- Parameters:
- Return type:
- Raises:
NotFittedError – If called before
fit().
- survival(y, X=None, offset=None)[source]#
Estimate the survival function S(y) = 1 − F(y|x).
- Parameters:
- Return type:
- Raises:
NotFittedError – If called before
fit().
- class mltpy.tram.Lm(support, optimizer_config=None, scaling=None)[source]#
Bases:
_TramModelNormal linear regression expressed as a CTM.
Fixes the Bernstein basis to
order=1and the base distribution to standard normal. With these constraints the transformation \(h(y) = \theta_0 (1-u) + \theta_1 u\), where \(u = (y-a)/(b-a)\), is affine, so the CTM \(h(Y) - \beta^\top X \sim \mathcal{N}(0,1)\) is exactly equivalent to the classical normal linear model \(Y = \mu + \gamma^\top X + \varepsilon\), \(\varepsilon \sim \mathcal{N}(0, \sigma^2)\).The mapping between CTM and lm parameters is
\[\begin{split}\hat{\sigma} &= (b - a) / (\theta_1 - \theta_0), \\ \hat{\mu} &= a - \theta_0 \hat{\sigma}, \\ \hat{\gamma} &= -\hat{\sigma} \, \beta_{\mathrm{ctm}}.\end{split}\]The minus sign on \(\hat{\gamma}\) reflects mltpy’s internal shift convention
h(y) + X @ beta = z(the Rtrampackage usesh(y) - X @ beta = z, hence R’s \(\beta\) equals \(-\beta_{\mathrm{ctm}}\)).Note that \(\hat{\sigma}\) is the MLE, which differs from the unbiased OLS estimator returned by
lm()by a factor \(\sqrt{(n-p)/n}\).These are exposed via
sigma_,intercept_, andcoef_(sklearn-style fitted attributes).- Parameters:
support (
tuple[float,float]) – Closed interval(a, b)covering all observed response values.optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, library defaults are used.scaling (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional scaling-design matrix of shape(n, q_s)mirroring Rtram::Lm(y ~ x_d | x_s, ..., scale = ~x_s). When supplied, the fitted model is heteroskedastic — the constant-variance closed-form mapping tosigma_/intercept_/coef_no longer applies, and those properties raiseNotImplementedErrorpointing atgamma_. Usepredict()withX_scale_newfor inference, and access the scaling-block coefficients viagamma_. Sign-aligned with R (ADR 0002, Decision 5).
Notes
The Bernstein order is fixed at
1by construction; passing anorderkeyword raisesTypeError.Examples
>>> import numpy as np >>> from mltpy.tram import Lm >>> rng = np.random.default_rng(0) >>> x = rng.normal(size=200) >>> y = 2.0 + 3.0 * x + rng.normal(scale=0.5, size=200) >>> model = Lm(support=(y.min() - 0.1, y.max() + 0.1)) >>> model.fit(y, X=x.reshape(-1, 1)) >>> # OLS cross-check >>> A = np.c_[np.ones_like(x), x] >>> beta_ols, *_ = np.linalg.lstsq(A, y, rcond=None) >>> np.allclose([model.intercept_, model.coef_[0]], beta_ols, atol=0.05) True
- property coef_: ndarray[tuple[Any, ...], dtype[float64]]#
Estimated regression coefficients of the equivalent lm.
Computed as
-sigma_ * beta_ctm, wherebeta_ctmis the covariate part oftheta_. Has shape(0,)when no covariates were supplied at fit time.- Raises:
NotFittedError – If accessed before
fit().NotImplementedError – If the model was fitted with
scaling=— seesigma_.
- property feature_names_scaling_: list[str]#
Column names of the scaling-design matrix supplied at fit time.
Populated from a
pandas.DataFramecolumn index when available, otherwise["X1", "X2", ...].- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#
Fitted scaling-block coefficients
γ(lengthq_s).Sign-aligned with R
tram::Lm(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- property intercept_: float#
Estimated intercept of the equivalent lm.
Computed as
a - theta_0 * sigma_.- Raises:
NotFittedError – If accessed before
fit().NotImplementedError – If the model was fitted with
scaling=— seesigma_.
- property sigma_: float#
Estimated residual standard deviation of the equivalent lm.
Computed as
(b - a) / (theta_1 - theta_0).- Raises:
NotFittedError – If accessed before
fit().NotImplementedError – If the model was fitted with
scaling=— the residual standard deviation is no longer constant inx_sand the scalar closed-form mapping is undefined. Usegamma_andpredict()instead.RuntimeError – If the fit is degenerate with
theta_[1] == theta_[0], which is feasible under the non-strict monotonicity constraint but leaves the lm-equivalence mapping undefined.
- class mltpy.tram.Polr(levels=None, distribution='logistic', optimizer_config=None)[source]#
Bases:
ConditionalTransformationModelProportional-odds ordinal regression (R
tram::Polr).For an ordered response
Y ∈ {1, ..., K}and covariatesx, models\[P(Y \leq k \mid x) = F(\theta_k + x^\top \beta), \quad k = 1, \ldots, K-1,\]where
Fis the CDF of the chosen base distribution andθ_1 ≤ ... ≤ θ_{K-1}are the cutpoints. Internally implemented as a CTM with a degenerateOrdinalBasisplus the standard interval-censored likelihood path — the integer cut positions select the rightθ_kper observation.Note
Sign convention. mltpy parameterises
h(y|x) = h(y) + x'β, so the fittedβhas the opposite sign of R’stram::Polr(which usesh(y) - x'β). Negatecoef_to compare with R output.- Parameters:
levels (
Sequence[Any] |None) – Optional explicit ordered tuple of category labels. WhenNone(default), levels are inferred atfit()time from a pandas orderedCategorical(usescat.categories) or, failing that, from sorted unique values ofy.distribution (
Literal['logistic','normal','min_extreme_value']) –Base distribution / link.
"logistic"(default) — proportional-odds model (R default)."normal"— ordered probit."min_extreme_value"— proportional-hazards / cloglog link.
optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, library defaults are used.
Examples
>>> import numpy as np >>> import pandas as pd >>> from mltpy import Polr >>> rng = np.random.default_rng(0) >>> y = pd.Categorical( ... rng.choice(["low", "mid", "high"], size=200), ... categories=["low", "mid", "high"], ... ordered=True, ... ) >>> X = rng.standard_normal((200, 2)) >>> m = Polr().fit(y, X) >>> probs = m.predict_proba(X[:5])
- property coef_: ndarray[tuple[Any, ...], dtype[float64]]#
Estimated regression coefficients
β.Length equals
X.shape[1]fromfit(); empty array when noXwas supplied. mltpy’s sign convention (h + Xβ) flips the sign relative to Rtram::Polr(h − Xβ) — negate to compare.
- property cutpoints_: ndarray[tuple[Any, ...], dtype[float64]]#
Estimated cutpoints
θ_1, ..., θ_{K-1}.
- fit(y, X=None, weights=None, offset=None)[source]#
Fit the Polr model by maximum likelihood.
- Parameters:
y (
Sequence[Any] |ndarray[tuple[Any,...],dtype[Any]]) – Ordered categorical response. Accepts a pandas orderedCategorical/Series, or any sequence of hashable labels (whose sorted unique values determine the level order).X (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix of shape(n, q).weights (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional non-negative per-observation weights of shape(n,).offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional fixed linear predictor offset of shape(n,).
- Return type:
- predict(*args, **kwargs)[source]#
Disabled for
Polr— usepredict_proba()orpredict_class().Continuous CDF/density predictions are not meaningful for an ordinal response.
- predict_class(X=None, offset=None)[source]#
Predict the modal level (argmax over
predict_proba).Returns the original level labels (decoded back from internal codes).
- class mltpy.tram.Survreg(support, distribution='weibull', order=6, optimizer_config=None, scaling=None)[source]#
Bases:
_TramModelParametric survival model on the log-time scale (R
tram::Survreg).Fits a monotone transformation h(log t) such that h(log T | X) follows a standard distribution. This is equivalent to fitting a TRAM on
Y = log(T)— hence the name Survreg. Supported distributions:"weibull"— Weibull (minimum extreme value / reversed Gumbel link)"lognormal"— Log-normal (standard normal link)"loglogistic"— Log-logistic (standard logistic link)
With covariates X the model is proportional (Weibull / log-logistic) or additive (log-normal) on the log-time scale.
- Parameters:
support (
tuple[float,float]) – Closed interval(a, b)with0 < a < bon the original positive time scale (not log-scale). Should bracket all observed survival times.distribution (
Literal['weibull','lognormal','loglogistic']) – Parametric family:"weibull"(default),"lognormal", or"loglogistic".order (
int) – Polynomial degree of the Bernstein basis on the log scale. Defaults to 6. Note thattram::Survregitself fits a strictly affine (two-parameter) baseline onlog(t)regardless oforder;order = 1on the mltpy side reproduces that parameterisation and is required for R parity comparisons.optimizer_config (
OptimizerConfig|None) – Optimisation settings. IfNone, library defaults are used.scaling (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional scaling-design matrix of shape(n, q_s)mirroring Rtram::Survreg(Surv(y, event) ~ x_d | x_s, ..., scale=~x_s). When supplied, the model becomes heteroskedastic on the log-time scale:h(log t | x) = h_0(log t) · exp(0.5 · x_s · γ) + x_d · β. The fitted parameter vector gains a γ block exposed asgamma_, and thesurvival(),hazard(), andpredict()methods requireX_scale/X_scale_new. Sign-aligned with R (ADR 0002, Decision 5).
Examples
>>> from mltpy.tram import Survreg >>> from mltpy.variables import CensoredData >>> import numpy as np >>> rng = np.random.default_rng(0) >>> t = rng.lognormal(mean=1.0, sigma=0.5, size=200) >>> status = rng.binomial(1, 0.7, size=200).astype(bool) >>> cd = CensoredData.right_censored(t, censored=~status) >>> model = Survreg(support=(t.min() * 0.9, t.max() * 1.1)).fit(cd) >>> surv = model.survival(t)
- property feature_names_scaling_: list[str]#
Column names of the scaling-design matrix supplied at fit time.
Populated from a
pandas.DataFramecolumn index when available, otherwise["X1", "X2", ...].- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#
Fitted scaling-block coefficients
γ(lengthq_s).Sign-aligned with R
tram::Survreg(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).- Raises:
NotFittedError – If accessed before
fit().ValueError – If the model was constructed without
scaling=.
- hazard(y, X=None, offset=None, X_scale=None)[source]#
Estimate the hazard rate h_T(t) = f(t | x) / S(t | x).
- Parameters:
y (
ndarray[tuple[Any,...],dtype[double]]) – Time points withinsupport.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix.offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation offset.X_scale (
ndarray[tuple[Any,...],dtype[double]] |None) – New-data scaling-design matrix of shape(m, q_s), required when the model was fitted withscaling=. Threaded through topredict()asX_scale_new.
- Return type:
- survival(y, X=None, offset=None, X_scale=None)[source]#
Estimate the survival function S(t) = 1 − F(t | x).
- Parameters:
y (
ndarray[tuple[Any,...],dtype[double]]) – Time points withinsupport.X (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional covariate matrix of shape(m, q).offset (
ndarray[tuple[Any,...],dtype[double]] |None) – Optional per-observation offset.X_scale (
ndarray[tuple[Any,...],dtype[double]] |None) – New-data scaling-design matrix of shape(m, q_s), required when the model was fitted withscaling=. Threaded through topredict()asX_scale_new.
- Return type: