Note

This page was generated from the notebook examples/02_survival_analysis.ipynb.

Survival analysis with Coxph#

mltpy.Coxph fits a proportional-hazards model in which the baseline log-cumulative-hazard is modelled as a flexible, monotone Bernstein polynomial and the link function is the minimum-extreme-value distribution. With covariates entering linearly, this is equivalent to the classical Cox proportional-hazards model, but the baseline distribution is estimated smoothly rather than left fully non-parametric.

This vignette reproduces the Cox-PH illustration from Hothorn (2020) on synthetic right-censored data and compares the estimated survival curve to the Kaplan-Meier estimate from lifelines.

lifelines is an optional dependency: install with pip install "mltpy[examples]".

[1]:
import numpy as np
import matplotlib.pyplot as plt
from lifelines import KaplanMeierFitter

import mltpy

rng = np.random.default_rng(0)
plt.rcParams["figure.dpi"] = 110

Data-generating process#

250 event times from \(T\sim\text{Exp}(\text{scale}=2)\), observed under independent uniform right-censoring. The censoring window is tuned to a ~30 % censoring rate, a realistic setting for observational follow-up studies.

mltpy expects a CensoredData container with the convention censored=True when the event is not observed.

[2]:
n = 250
t_event = rng.exponential(scale=2.0, size=n)
c = rng.uniform(0.8, 5.0, size=n)
y_obs = np.minimum(t_event, c)
event = t_event <= c         # True -> exact observation
censored = ~event            # mltpy: True -> right-censored
print(f"censoring rate = {censored.mean():.0%}")
print(f"observed time range: [{y_obs.min():.3f}, {y_obs.max():.3f}]")
censoring rate = 33%
observed time range: [0.003, 4.891]
[3]:
cd = mltpy.CensoredData.right_censored(y_obs, censored)

support = (float(y_obs.min()) * 0.5, float(y_obs.max()) + 0.5)
model = mltpy.Coxph(support=support, order=6)
model.fit(cd)
print(model.summary())
Model:        Coxph
Support:      (0.001287750334822838, 5.390734500379891)
Basis order:  6
Fitted:       Yes
Log-lik:      -330.5756
AIC:          675.1511
BIC:          699.8014
Converged:    Yes
n_restarts:   0

Kaplan-Meier reference#

The non-parametric Kaplan-Meier estimator is the standard baseline for right-censored survival data. It serves as the reference against which the smooth Coxph estimate is compared. lifelines expects the event indicator — the complement of censored.

[4]:
kmf = KaplanMeierFitter()
kmf.fit(durations=y_obs, event_observed=event.astype(int), label="Kaplan-Meier")
km_median = kmf.median_survival_time_
ctm_median = model.predict(np.array([0.5]), what="quantile")[0]
true_median = 2.0 * np.log(2)
print(f"true median     : {true_median:.3f}")
print(f"Kaplan-Meier    : {km_median:.3f}")
print(f"mltpy Coxph     : {ctm_median:.3f}")
true median     : 1.386
Kaplan-Meier    : 1.596
mltpy Coxph     : 1.555

Survival and hazard curves#

[5]:
t_grid = np.linspace(support[0] + 1e-3, support[1] - 1e-3, 400)
S_ctm = model.survival(t_grid)
h_ctm = model.hazard(t_grid)
# Truth: Exp(scale=2) has S(t) = exp(-t/2), hazard 0.5.
S_true = np.exp(-t_grid / 2.0)
h_true = np.full_like(t_grid, 0.5)

fig, (ax_s, ax_h) = plt.subplots(1, 2, figsize=(11, 4))

ax_s.plot(t_grid, S_true, "--", color="0.3", label="true S(t) = exp(-t/2)")
kmf.plot_survival_function(ax=ax_s, color="C3", ci_show=True)
ax_s.plot(t_grid, S_ctm, label="mltpy Coxph", lw=2, color="C0")
ax_s.set_xlabel("t")
ax_s.set_ylabel("S(t)")
ax_s.set_title("Survival function")
ax_s.legend()

ax_h.plot(t_grid, h_true, "--", color="0.3", label="true hazard = 0.5")
ax_h.plot(t_grid, h_ctm, label="mltpy Coxph", lw=2, color="C0")
ax_h.set_xlabel("t")
ax_h.set_ylabel("h(t)")
ax_h.set_title("Hazard rate")
ax_h.set_ylim(0, 1.5)
ax_h.legend()
fig.tight_layout();
../_images/examples_02_survival_analysis_8_0.png

Takeaways#

  • The Coxph estimate tracks the Kaplan-Meier survival curve closely, but is smooth — it does not have the staircase pattern of KM and returns finite values everywhere inside the support.

  • Because the fit is smooth, the hazard rate is directly available via model.hazard(t). KM provides no hazard estimate without further kernel smoothing.

  • The median survival time recovered by mltpy sits between the truth \(2\ln 2 \approx 1.386\) and the KM estimate — both are consistent at \(n=250\) with ~30 % censoring.

  • Coxph accepts covariates through fit(cd, X=X) exactly like MLT; the hazard ratios appear in the trailing entries of model.theta_.