{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Survival analysis with Coxph\n",
    "\n",
    "`mltpy.Coxph` fits a proportional-hazards model in which the baseline\n",
    "log-cumulative-hazard is modelled as a flexible, monotone Bernstein\n",
    "polynomial and the link function is the minimum-extreme-value distribution.\n",
    "With covariates entering linearly, this is equivalent to the classical\n",
    "Cox proportional-hazards model, but the baseline distribution is estimated\n",
    "smoothly rather than left fully non-parametric.\n",
    "\n",
    "This vignette reproduces the Cox-PH illustration from Hothorn (2020) on\n",
    "synthetic right-censored data and compares the estimated survival curve\n",
    "to the Kaplan-Meier estimate from [lifelines](https://lifelines.readthedocs.io/).\n",
    "\n",
    "> *lifelines is an optional dependency:* install with\n",
    "> `pip install \"mltpy[examples]\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from lifelines import KaplanMeierFitter\n",
    "\n",
    "import mltpy\n",
    "\n",
    "rng = np.random.default_rng(0)\n",
    "plt.rcParams[\"figure.dpi\"] = 110"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data-generating process\n",
    "\n",
    "250 event times from $T\\sim\\text{Exp}(\\text{scale}=2)$, observed under\n",
    "independent uniform right-censoring. The censoring window is tuned to a\n",
    "~30 % censoring rate, a realistic setting for observational follow-up\n",
    "studies.\n",
    "\n",
    "mltpy expects a `CensoredData` container with the convention\n",
    "`censored=True` when the event is *not* observed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 250\n",
    "t_event = rng.exponential(scale=2.0, size=n)\n",
    "c = rng.uniform(0.8, 5.0, size=n)\n",
    "y_obs = np.minimum(t_event, c)\n",
    "event = t_event <= c         # True -> exact observation\n",
    "censored = ~event            # mltpy: True -> right-censored\n",
    "print(f\"censoring rate = {censored.mean():.0%}\")\n",
    "print(f\"observed time range: [{y_obs.min():.3f}, {y_obs.max():.3f}]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cd = mltpy.CensoredData.right_censored(y_obs, censored)\n",
    "\n",
    "support = (float(y_obs.min()) * 0.5, float(y_obs.max()) + 0.5)\n",
    "model = mltpy.Coxph(support=support, order=6)\n",
    "model.fit(cd)\n",
    "print(model.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Kaplan-Meier reference\n",
    "\n",
    "The non-parametric Kaplan-Meier estimator is the standard baseline for\n",
    "right-censored survival data. It serves as the reference against which\n",
    "the smooth Coxph estimate is compared. lifelines expects the *event*\n",
    "indicator — the complement of `censored`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kmf = KaplanMeierFitter()\n",
    "kmf.fit(durations=y_obs, event_observed=event.astype(int), label=\"Kaplan-Meier\")\n",
    "km_median = kmf.median_survival_time_\n",
    "ctm_median = model.predict(np.array([0.5]), what=\"quantile\")[0]\n",
    "true_median = 2.0 * np.log(2)\n",
    "print(f\"true median     : {true_median:.3f}\")\n",
    "print(f\"Kaplan-Meier    : {km_median:.3f}\")\n",
    "print(f\"mltpy Coxph     : {ctm_median:.3f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Survival and hazard curves"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_grid = np.linspace(support[0] + 1e-3, support[1] - 1e-3, 400)\n",
    "S_ctm = model.survival(t_grid)\n",
    "h_ctm = model.hazard(t_grid)\n",
    "# Truth: Exp(scale=2) has S(t) = exp(-t/2), hazard 0.5.\n",
    "S_true = np.exp(-t_grid / 2.0)\n",
    "h_true = np.full_like(t_grid, 0.5)\n",
    "\n",
    "fig, (ax_s, ax_h) = plt.subplots(1, 2, figsize=(11, 4))\n",
    "\n",
    "ax_s.plot(t_grid, S_true, \"--\", color=\"0.3\", label=\"true S(t) = exp(-t/2)\")\n",
    "kmf.plot_survival_function(ax=ax_s, color=\"C3\", ci_show=True)\n",
    "ax_s.plot(t_grid, S_ctm, label=\"mltpy Coxph\", lw=2, color=\"C0\")\n",
    "ax_s.set_xlabel(\"t\")\n",
    "ax_s.set_ylabel(\"S(t)\")\n",
    "ax_s.set_title(\"Survival function\")\n",
    "ax_s.legend()\n",
    "\n",
    "ax_h.plot(t_grid, h_true, \"--\", color=\"0.3\", label=\"true hazard = 0.5\")\n",
    "ax_h.plot(t_grid, h_ctm, label=\"mltpy Coxph\", lw=2, color=\"C0\")\n",
    "ax_h.set_xlabel(\"t\")\n",
    "ax_h.set_ylabel(\"h(t)\")\n",
    "ax_h.set_title(\"Hazard rate\")\n",
    "ax_h.set_ylim(0, 1.5)\n",
    "ax_h.legend()\n",
    "fig.tight_layout();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Takeaways\n",
    "\n",
    "- The Coxph estimate tracks the Kaplan-Meier survival curve closely, but\n",
    "  is smooth — it does not have the staircase pattern of KM and returns\n",
    "  finite values everywhere inside the support.\n",
    "- Because the fit is smooth, the *hazard rate* is directly available via\n",
    "  `model.hazard(t)`. KM provides no hazard estimate without further\n",
    "  kernel smoothing.\n",
    "- The median survival time recovered by mltpy sits between the truth\n",
    "  $2\\ln 2 \\approx 1.386$ and the KM estimate — both are consistent at\n",
    "  $n=250$ with ~30 % censoring.\n",
    "- `Coxph` accepts covariates through `fit(cd, X=X)` exactly like\n",
    "  `MLT`; the hazard ratios appear in the trailing entries of `model.theta_`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}