tram#

Convenience layer for common conditional transformation models.

This module provides pre-configured wrappers around ConditionalTransformationModel / MLT that mirror the R tram package (Hothorn). Users working with these classes never need to import BernsteinBasis, CensoringType, or OptimizerConfig directly.

Classes#

BoxCox

Box-Cox transformation model for continuous outcomes with exact observations.

Coxph

Cox proportional hazards model for right-censored survival data.

Colr

Continuous outcome logistic regression — uses a logistic base distribution.

Lm

Normal linear regression as a CTM (order=1 Bernstein, normal base).

class mltpy.tram.BoxCox(support, order=6, optimizer_config=None, censoring=CensoringType.NONE, scaling=None)[source]#

Bases: _TramModel

Box-Cox transformation model for continuous outcomes.

Fits a flexible, monotone transformation h(y) that maps the response distribution to a standard normal. Useful as a non-parametric generalisation of the classical Box-Cox power transform when the normality assumption for linear regression is violated.

Parameters:
  • support (tuple[float, float]) – Closed interval (a, b) covering all observed values.

  • order (int) – Polynomial degree of the Bernstein basis. Defaults to 6.

  • optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, library defaults are used.

  • censoring (CensoringType) – Censoring type of the response data. Defaults to NONE. Pass RIGHT, LEFT, or INTERVAL together with a CensoredData y to fit the censored Box-Cox likelihood.

  • scaling (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional scaling-design matrix of shape (n, q_s) mirroring R tram::BoxCox(..., scale=~x_s). Threads through to the scaled-baseline likelihood (issue #71) and the scaled-predict path (issue #72). When supplied, the fitted parameter vector gains a γ block (length q_s) exposed as gamma_, and predict() requires X_scale_new. Sign-aligned with the R scale= block (ADR 0002, Decision 5); see docs/adr/0002-scaling-terms.md.

Examples

>>> from mltpy.tram import BoxCox
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> y = rng.lognormal(size=200)
>>> model = BoxCox(support=(y.min(), y.max()))
>>> model.fit(y)
>>> cdf   = model.predict(y, what="distribution")
>>> trafo = model.fitted_transformation(y)
property feature_names_scaling_: list[str]#

Column names of the scaling-design matrix supplied at fit time.

Populated from a pandas.DataFrame column index when available, otherwise ["X1", "X2", ...].

Raises:
fitted_transformation(y)[source]#

Evaluate the raw fitted transformation h(y) = B_k(y) @ theta_b.

This is the monotone function that maps the observed response scale to the latent standard-normal scale. Useful for visualising the shape of the estimated transformation.

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]]) – Response values within basis.support.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().

property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#

Fitted scaling-block coefficients γ (length q_s).

Sign-aligned with R tram::BoxCox(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).

Raises:
class mltpy.tram.Colr(support, order=6, optimizer_config=None, scaling=None)[source]#

Bases: _TramModel

Continuous outcome logistic regression.

Fits a monotone transformation h(y) such that h(Y|X) follows a standard logistic distribution. Analogous to ordinal logistic regression but for continuous, fully observed outcomes. Produces proportional-odds model when covariates are included.

Parameters:
  • support (tuple[float, float]) – Closed interval (a, b) covering all observed values.

  • order (int) – Polynomial degree of the Bernstein basis. Defaults to 6.

  • optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, library defaults are used.

  • scaling (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional scaling-design matrix of shape (n, q_s) mirroring R tram::Colr(y ~ x_d | x_s). Threads through to the scaled-baseline likelihood (#71) and the scaled-predict path (#72). When supplied, the fit becomes a heteroskedastic continuous-outcome logistic regression with non-proportional log-odds — the log-odds gap between two x_s values varies with y (the proportional-odds assumption is relaxed). The fitted parameter vector gains a γ block exposed as gamma_, and predict() requires X_scale_new. Sign-aligned with R (ADR 0002, Decision 5); see docs/adr/0002-scaling-terms.md.

Examples

>>> from mltpy.tram import Colr
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> y = rng.logistic(loc=2.0, scale=0.5, size=200)
>>> model = Colr(support=(y.min(), y.max()))
>>> model.fit(y)
>>> cdf = model.predict(y, what="distribution")
property feature_names_scaling_: list[str]#

Column names of the scaling-design matrix supplied at fit time.

Populated from a pandas.DataFrame column index when available, otherwise ["X1", "X2", ...].

Raises:
property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#

Fitted scaling-block coefficients γ (length q_s).

Sign-aligned with R tram::Colr(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).

Raises:
class mltpy.tram.Coxph(support, order=6, optimizer_config=None, interacting=None, scaling=None)[source]#

Bases: _TramModel

Cox proportional hazards model for right-censored survival data.

Fits a monotone transformation h(t) under right-censoring using the minimum extreme value ("min_extreme_value") base distribution, also known as the reversed Gumbel link. With covariates entering linearly, this parameterisation is equivalent to the classical Cox proportional hazards model. The baseline distribution is estimated non-parametrically via a Bernstein polynomial.

Pass interacting to fit a non-proportional (stratified or fully-interacting) Cox model where the transformation itself depends on the covariate via the tensor product h(t | x) = (a(t) b(x))ᵀ vec(Θ). See ADR 0001 and InteractionBasis for the parameter-vector layout and the column-wise monotonicity strategy.

Parameters:
  • support (tuple[float, float]) – Closed interval (a, b) with a > 0 and b at least as large as the longest observed follow-up time.

  • order (int) – Polynomial degree of the Bernstein basis on the response. Defaults to 6.

  • optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, library defaults are used.

  • interacting (BernsteinBasis | OrdinalBasis | None) – Optional x-basis (BernsteinBasis, OrdinalBasis, or InterceptBasis). When provided, the model is fit as MLT(InteractionBasis(BernsteinBasis(...), interacting)) instead of the standard shift model. Only exact (non-censored) time data is currently supported on this path; censoring with an interacting basis is not yet implemented in the likelihood path.

  • scaling (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional scaling-design matrix of shape (n, q_s) mirroring R tram::Coxph(Surv(y, event) ~ x_d | x_s). Routes through to the scaled-baseline likelihood (#71) and the scaled-predict path (#72). When supplied, the fit becomes a heteroskedastic / non-proportional-hazards Cox model log[-log S(t | x)] = h_0(t) · exp(x_s · γ) + x_d · β: the hazard ratio between two x_s values varies with t (the proportional-hazards assumption is relaxed). The fitted parameter vector gains a γ block exposed as gamma_, and the survival(), hazard(), and predict() methods require X_scale / X_scale_new. Sign-aligned with R (ADR 0002, Decision 5). Not supported together with interacting= (ADR 0002, Decision 2).

Examples

>>> from mltpy.tram import Coxph
>>> from mltpy.variables import CensoredData
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> y_time   = rng.exponential(scale=2.0, size=200)
>>> y_status = rng.binomial(1, 0.7, size=200).astype(bool)
>>> cd = CensoredData.right_censored(y_time, censored=~y_status)
>>> model = Coxph(support=(0.01, y_time.max()))
>>> model.fit(cd)
>>> surv = model.survival(y_time)
property feature_names_scaling_: list[str]#

Column names of the scaling-design matrix supplied at fit time.

Populated from a pandas.DataFrame column index when available, otherwise ["X1", "X2", ...].

Raises:
property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#

Fitted scaling-block coefficients γ (length q_s).

Sign-aligned with R tram::Coxph(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).

Raises:
hazard(y, X=None, offset=None, X_scale=None)[source]#

Estimate the hazard rate h(y) = f(y|x) / S(y|x).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().

survival(y, X=None, offset=None, X_scale=None)[source]#

Estimate the survival function S(y) = 1 − F(y|x).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().

class mltpy.tram.Lehmann(support, order=6, optimizer_config=None)[source]#

Bases: _TramModel

Lehmann (proportional reverse-time hazards) model for right-censored data.

Dual of Coxph. Fits a monotone transformation h(t) under right-censoring using the maximum extreme value ("max_extreme_value") base distribution — the standard Gumbel distribution. With covariates entering linearly this parameterisation satisfies -log F(t | x) = h(t) + x'β, which is the Lehmann alternative (proportional reverse-time hazards) model.

Parameters:
  • support (tuple[float, float]) – Closed interval (a, b) with a > 0 and b at least as large as the longest observed follow-up time.

  • order (int) – Polynomial degree of the Bernstein basis. Defaults to 6.

  • optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, library defaults are used.

Examples

>>> from mltpy.tram import Lehmann
>>> from mltpy.variables import CensoredData
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> y_time   = rng.exponential(scale=2.0, size=200)
>>> y_status = rng.binomial(1, 0.7, size=200).astype(bool)
>>> cd = CensoredData.right_censored(y_time, censored=~y_status)
>>> model = Lehmann(support=(0.01, y_time.max()))
>>> model.fit(cd)
>>> surv = model.survival(y_time)
hazard(y, X=None, offset=None)[source]#

Estimate the hazard rate h(y) = f(y|x) / S(y|x).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().

survival(y, X=None, offset=None)[source]#

Estimate the survival function S(y) = 1 − F(y|x).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().

class mltpy.tram.Lm(support, optimizer_config=None, scaling=None)[source]#

Bases: _TramModel

Normal linear regression expressed as a CTM.

Fixes the Bernstein basis to order=1 and the base distribution to standard normal. With these constraints the transformation \(h(y) = \theta_0 (1-u) + \theta_1 u\), where \(u = (y-a)/(b-a)\), is affine, so the CTM \(h(Y) - \beta^\top X \sim \mathcal{N}(0,1)\) is exactly equivalent to the classical normal linear model \(Y = \mu + \gamma^\top X + \varepsilon\), \(\varepsilon \sim \mathcal{N}(0, \sigma^2)\).

The mapping between CTM and lm parameters is

\[\begin{split}\hat{\sigma} &= (b - a) / (\theta_1 - \theta_0), \\ \hat{\mu} &= a - \theta_0 \hat{\sigma}, \\ \hat{\gamma} &= -\hat{\sigma} \, \beta_{\mathrm{ctm}}.\end{split}\]

The minus sign on \(\hat{\gamma}\) reflects mltpy’s internal shift convention h(y) + X @ beta = z (the R tram package uses h(y) - X @ beta = z, hence R’s \(\beta\) equals \(-\beta_{\mathrm{ctm}}\)).

Note that \(\hat{\sigma}\) is the MLE, which differs from the unbiased OLS estimator returned by lm() by a factor \(\sqrt{(n-p)/n}\).

These are exposed via sigma_, intercept_, and coef_ (sklearn-style fitted attributes).

Parameters:
  • support (tuple[float, float]) – Closed interval (a, b) covering all observed response values.

  • optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, library defaults are used.

  • scaling (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional scaling-design matrix of shape (n, q_s) mirroring R tram::Lm(y ~ x_d | x_s, ..., scale = ~x_s). When supplied, the fitted model is heteroskedastic — the constant-variance closed-form mapping to sigma_ / intercept_ / coef_ no longer applies, and those properties raise NotImplementedError pointing at gamma_. Use predict() with X_scale_new for inference, and access the scaling-block coefficients via gamma_. Sign-aligned with R (ADR 0002, Decision 5).

Notes

The Bernstein order is fixed at 1 by construction; passing an order keyword raises TypeError.

Examples

>>> import numpy as np
>>> from mltpy.tram import Lm
>>> rng = np.random.default_rng(0)
>>> x = rng.normal(size=200)
>>> y = 2.0 + 3.0 * x + rng.normal(scale=0.5, size=200)
>>> model = Lm(support=(y.min() - 0.1, y.max() + 0.1))
>>> model.fit(y, X=x.reshape(-1, 1))
>>> # OLS cross-check
>>> A = np.c_[np.ones_like(x), x]
>>> beta_ols, *_ = np.linalg.lstsq(A, y, rcond=None)
>>> np.allclose([model.intercept_, model.coef_[0]], beta_ols, atol=0.05)
True
property coef_: ndarray[tuple[Any, ...], dtype[float64]]#

Estimated regression coefficients of the equivalent lm.

Computed as -sigma_ * beta_ctm, where beta_ctm is the covariate part of theta_. Has shape (0,) when no covariates were supplied at fit time.

Raises:
property feature_names_scaling_: list[str]#

Column names of the scaling-design matrix supplied at fit time.

Populated from a pandas.DataFrame column index when available, otherwise ["X1", "X2", ...].

Raises:
fitted_transformation(y)[source]#

Evaluate the fitted affine transformation h(y) = B(y) @ theta_b.

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]]) – Response values within basis.support.

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Raises:

NotFittedError – If called before fit().

property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#

Fitted scaling-block coefficients γ (length q_s).

Sign-aligned with R tram::Lm(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).

Raises:
property intercept_: float#

Estimated intercept of the equivalent lm.

Computed as a - theta_0 * sigma_.

Raises:
property sigma_: float#

Estimated residual standard deviation of the equivalent lm.

Computed as (b - a) / (theta_1 - theta_0).

Raises:
  • NotFittedError – If accessed before fit().

  • NotImplementedError – If the model was fitted with scaling= — the residual standard deviation is no longer constant in x_s and the scalar closed-form mapping is undefined. Use gamma_ and predict() instead.

  • RuntimeError – If the fit is degenerate with theta_[1] == theta_[0], which is feasible under the non-strict monotonicity constraint but leaves the lm-equivalence mapping undefined.

class mltpy.tram.Polr(levels=None, distribution='logistic', optimizer_config=None)[source]#

Bases: ConditionalTransformationModel

Proportional-odds ordinal regression (R tram::Polr).

For an ordered response Y {1, ..., K} and covariates x, models

\[P(Y \leq k \mid x) = F(\theta_k + x^\top \beta), \quad k = 1, \ldots, K-1,\]

where F is the CDF of the chosen base distribution and θ_1 ... θ_{K-1} are the cutpoints. Internally implemented as a CTM with a degenerate OrdinalBasis plus the standard interval-censored likelihood path — the integer cut positions select the right θ_k per observation.

Note

Sign convention. mltpy parameterises h(y|x) = h(y) + x'β, so the fitted β has the opposite sign of R’s tram::Polr (which uses h(y) - x'β). Negate coef_ to compare with R output.

Parameters:
  • levels (Sequence[Any] | None) – Optional explicit ordered tuple of category labels. When None (default), levels are inferred at fit() time from a pandas ordered Categorical (uses cat.categories) or, failing that, from sorted unique values of y.

  • distribution (Literal['logistic', 'normal', 'min_extreme_value']) –

    Base distribution / link.

    • "logistic" (default) — proportional-odds model (R default).

    • "normal" — ordered probit.

    • "min_extreme_value" — proportional-hazards / cloglog link.

  • optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, library defaults are used.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from mltpy import Polr
>>> rng = np.random.default_rng(0)
>>> y = pd.Categorical(
...     rng.choice(["low", "mid", "high"], size=200),
...     categories=["low", "mid", "high"],
...     ordered=True,
... )
>>> X = rng.standard_normal((200, 2))
>>> m = Polr().fit(y, X)
>>> probs = m.predict_proba(X[:5])
property K_: int#

Number of ordered levels.

property coef_: ndarray[tuple[Any, ...], dtype[float64]]#

Estimated regression coefficients β.

Length equals X.shape[1] from fit(); empty array when no X was supplied. mltpy’s sign convention (h + ) flips the sign relative to R tram::Polr (h ) — negate to compare.

property cutpoints_: ndarray[tuple[Any, ...], dtype[float64]]#

Estimated cutpoints θ_1, ..., θ_{K-1}.

fit(y, X=None, weights=None, offset=None)[source]#

Fit the Polr model by maximum likelihood.

Parameters:
Return type:

Polr

property levels_: tuple[Any, ...]#

Ordered category labels resolved at fit time.

predict(*args, **kwargs)[source]#

Disabled for Polr — use predict_proba() or predict_class().

Continuous CDF/density predictions are not meaningful for an ordinal response.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

predict_class(X=None, offset=None)[source]#

Predict the modal level (argmax over predict_proba).

Returns the original level labels (decoded back from internal codes).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[Any]]

predict_proba(X=None, offset=None)[source]#

Compute per-row level probabilities P(Y = level_k | x).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

summary()[source]#

Multi-line summary with cutpoints and a Wald table for β.

Return type:

str

class mltpy.tram.Survreg(support, distribution='weibull', order=6, optimizer_config=None, scaling=None)[source]#

Bases: _TramModel

Parametric survival model on the log-time scale (R tram::Survreg).

Fits a monotone transformation h(log t) such that h(log T | X) follows a standard distribution. This is equivalent to fitting a TRAM on Y = log(T) — hence the name Survreg. Supported distributions:

  • "weibull" — Weibull (minimum extreme value / reversed Gumbel link)

  • "lognormal" — Log-normal (standard normal link)

  • "loglogistic" — Log-logistic (standard logistic link)

With covariates X the model is proportional (Weibull / log-logistic) or additive (log-normal) on the log-time scale.

Parameters:
  • support (tuple[float, float]) – Closed interval (a, b) with 0 < a < b on the original positive time scale (not log-scale). Should bracket all observed survival times.

  • distribution (Literal['weibull', 'lognormal', 'loglogistic']) – Parametric family: "weibull" (default), "lognormal", or "loglogistic".

  • order (int) – Polynomial degree of the Bernstein basis on the log scale. Defaults to 6. Note that tram::Survreg itself fits a strictly affine (two-parameter) baseline on log(t) regardless of order; order = 1 on the mltpy side reproduces that parameterisation and is required for R parity comparisons.

  • optimizer_config (OptimizerConfig | None) – Optimisation settings. If None, library defaults are used.

  • scaling (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional scaling-design matrix of shape (n, q_s) mirroring R tram::Survreg(Surv(y, event) ~ x_d | x_s, ..., scale=~x_s). When supplied, the model becomes heteroskedastic on the log-time scale: h(log t | x) = h_0(log t) · exp(0.5 · x_s · γ) + x_d · β. The fitted parameter vector gains a γ block exposed as gamma_, and the survival(), hazard(), and predict() methods require X_scale / X_scale_new. Sign-aligned with R (ADR 0002, Decision 5).

Examples

>>> from mltpy.tram import Survreg
>>> from mltpy.variables import CensoredData
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> t = rng.lognormal(mean=1.0, sigma=0.5, size=200)
>>> status = rng.binomial(1, 0.7, size=200).astype(bool)
>>> cd = CensoredData.right_censored(t, censored=~status)
>>> model = Survreg(support=(t.min() * 0.9, t.max() * 1.1)).fit(cd)
>>> surv = model.survival(t)
property feature_names_scaling_: list[str]#

Column names of the scaling-design matrix supplied at fit time.

Populated from a pandas.DataFrame column index when available, otherwise ["X1", "X2", ...].

Raises:
property gamma_: ndarray[tuple[Any, ...], dtype[float64]]#

Fitted scaling-block coefficients γ (length q_s).

Sign-aligned with R tram::Survreg(..., scale=~x_s)’s scaling block (ADR 0002, Decision 5).

Raises:
hazard(y, X=None, offset=None, X_scale=None)[source]#

Estimate the hazard rate h_T(t) = f(t | x) / S(t | x).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

survival(y, X=None, offset=None, X_scale=None)[source]#

Estimate the survival function S(t) = 1 − F(t | x).

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]