variables#

Variable types and censoring classes for conditional transformation models.

class mltpy.variables.CensoredData(exact, lower, upper, trunc_lower=None, trunc_upper=None)[source]#

Bases: object

Encodes n observations with optional censoring and truncation.

For observation i exactly one censoring pattern is valid:

Exact: exact[i] is finite
Right-censored: exact[i] is NaN, lower[i] finite, upper[i] = +inf
Left-censored: exact[i] is NaN, lower[i] = -inf, upper[i] finite
Interval-censored: exact[i] is NaN, both bounds finite

Truncation bounds constrain the observable range: only observations inside [trunc_lower[i], trunc_upper[i]] can appear in the sample.

Parameters:

exact (ndarray[tuple[Any, ...], dtype[double]]) – Length-n array. Use np.nan for censored observations.
lower (ndarray[tuple[Any, ...], dtype[double]]) – Length-n array of lower bounds. Use -np.inf for left-censored.
upper (ndarray[tuple[Any, ...], dtype[double]]) – Length-n array of upper bounds. Use +np.inf for right-censored.
trunc_lower (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional length-n array of left truncation points.
trunc_upper (ndarray[tuple[Any, ...], dtype[double]] | None) – Optional length-n array of right truncation points.

exact: ndarray[tuple[Any, ...], dtype[float64]]#

classmethod from_exact(y)[source]#

All observations exact (no censoring).

Parameters:: y (ndarray[tuple[Any, ...], dtype[double]])
Return type:: CensoredData

classmethod interval_censored(lower, upper)[source]#

All observations interval-censored with known bounds [lower, upper].

Parameters:

lower (ndarray[tuple[Any, ...], dtype[double]])
upper (ndarray[tuple[Any, ...], dtype[double]])

Return type:

CensoredData

property is_exact_mask: ndarray[tuple[Any, ...], dtype[bool]]#

True where observation is exact.

Type:: Boolean mask

property is_interval_censored_mask: ndarray[tuple[Any, ...], dtype[bool]]#

True where observation is interval-censored.

Type:: Boolean mask

property is_left_censored_mask: ndarray[tuple[Any, ...], dtype[bool]]#

True where observation is left-censored.

Type:: Boolean mask

property is_right_censored_mask: ndarray[tuple[Any, ...], dtype[bool]]#

True where observation is right-censored.

Type:: Boolean mask

classmethod left_censored(y, censored)[source]#

Left-censored data.

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]]) – Observed value (exact value or censoring threshold).
censored (ndarray[tuple[Any, ...], dtype[bool]]) – Boolean array. True means the actual value is below y (only an upper bound is known).

Return type:

CensoredData

classmethod left_truncated(y, trunc_lower, censored=None)[source]#

Left-truncated (delayed-entry) data, optionally with right censoring.

Mirrors R’s Surv(start, stop, event) counting-process encoding used by the survival package: each observation is only at risk starting from trunc_lower[i]. When censored is given, the same boolean convention as right_censored() applies — True means the actual event time is above y[i].

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]]) – Observed value (exact event time, or right-censoring threshold).
trunc_lower (ndarray[tuple[Any, ...], dtype[double]]) – Length-n array of left-truncation points (delayed-entry times).
censored (ndarray[tuple[Any, ...], dtype[bool]] | None) – Optional boolean array of right-censoring indicators. None (default) treats all observations as exactly observed.

Return type:

CensoredData

lower: ndarray[tuple[Any, ...], dtype[float64]]#

property n: int#: Number of observations.

property n_censored: int#

property n_exact: int#

classmethod right_censored(y, censored)[source]#

Right-censored data.

Parameters:

y (ndarray[tuple[Any, ...], dtype[double]]) – Observed value (exact value or censoring threshold).
censored (ndarray[tuple[Any, ...], dtype[bool]]) – Boolean array. True means the actual event time is above y (only a lower bound is known).

Return type:

CensoredData

trunc_lower: ndarray[tuple[Any, ...], dtype[float64]] | None = None#

trunc_upper: ndarray[tuple[Any, ...], dtype[float64]] | None = None#

upper: ndarray[tuple[Any, ...], dtype[float64]]#

class mltpy.variables.CensoringType(*values)[source]#

Bases: Enum

Censoring regime for a dataset passed to the log-likelihood.

INTERVAL = 4#

LEFT = 2#

NONE = 1#

RIGHT = 3#

class mltpy.variables.OrderedVariable(levels)[source]#

Bases: object

Ordered categorical response with K levels and K-1 transformation cutpoints.

Used by mltpy.tram.Polr (proportional-odds ordinal regression). A level-k observation (1 <= k <= K) is mapped to interval-censored bounds on a synthetic integer cut scale:

level 1   → (-∞, 1]
level k   → (k-1, k]      for 1 < k < K
level K   → (K-1, +∞)

Combined with mltpy.basis.OrdinalBasis, the cut position k selects one of K-1 Bernstein-like coefficients θ_k so that h(y_k) = θ_k exactly.

Parameters:: levels (tuple[Any, ...]) – Tuple of ordered category labels (any hashable values). Must contain at least two distinct levels.

property K: int#: Number of levels.

decode(codes)[source]#

Inverse of encode() — map 1..K codes back to labels.

Parameters:

codes (ndarray[tuple[Any, ...], dtype[int_]]) – Integer codes of shape (n,) with values in {1, ..., K}.

Returns:

Labels in their original dtype (object array for non-numeric).

Return type:

ndarray[tuple[Any, ...], dtype[Any]]

Raises:

ValueError – If any code is outside the valid range, or if any floating-point code is not integer-valued (e.g. 1.7).
TypeError – If codes has a non-numeric dtype (object, complex, …).

encode(y)[source]#

Map labels to 1-based integer codes 1..K.

Parameters:: y (Sequence[Any] | ndarray[tuple[Any, ...], dtype[Any]]) – Sequence of category labels.
Returns:: Integer codes of shape (n,).
Return type:: ndarray[tuple[Any, ...], dtype[int_]]
Raises:: ValueError – If any label is not in levels.

classmethod from_labels(y, levels=None)[source]#

Coerce raw observations into (OrderedVariable, CensoredData).

Level inference order:

If levels is given explicitly, use it.
Else if y is a pandas ordered Categorical, use y.cat.categories.
Else: sorted unique values (deterministic ordering).

Validates that every observation lies in the resolved level set.

Parameters:

y (Sequence[Any] | ndarray[tuple[Any, ...], dtype[Any]]) – Length-n sequence of category labels (any hashable type).
levels (Sequence[Any] | None) – Optional ordered tuple of all valid labels. Overrides automatic inference.

Returns:

variable carries the level vocabulary; censored_data has one row per observation with synthetic integer-cut bounds suitable for OrdinalBasis and the interval-censored likelihood path.

Return type:

tuple[OrderedVariable, CensoredData]

Raises:

ValueError – If levels is empty or not unique, or if any observation in y is missing from the resolved level set.

levels: tuple[Any, ...]#