# ADR 0001 — Tensor-Product Interaction Basis

**Date:** 2026-05-17  
**Status:** Accepted  
**Deciders:** RMKruse  
**Issue:** [#62 Interacting Terms: ADR for tensor-product API and monotonicity strategy](https://github.com/RMKruse/mltpy/issues/62)  
**Parent:** [#27 Interacting Terms (Tensor Products)](https://github.com/RMKruse/mltpy/issues/27)

---

## Context

Conditional Transformation Models allow the transformation `h(y|x)` to depend on covariates
either through a linear shift (`h(y|x) = a(y)·θ + x·β`, the current implementation) or
through a fully-interacting tensor-product structure
(`h(y|x) = (a(y) ⊗ b(x))ᵀ vec(Θ)`).  The tensor-product form is the natural
generalisation that covers interaction effects between the response and the covariate.

This ADR resolves five open questions required before any tensor-product code is written.

---

## Decision 1 — Public Surface

**Chosen:** Single new class `InteractionBasis(y_basis, x_basis)` in `mltpy.basis`,
re-exported from `mltpy.__init__`.  The existing `MLT` / `ConditionalTransformationModel`
classes accept this as the `basis=` argument; no new model class is introduced, and no
new model parameter (`interacting=...`) is added.

**Rationale:**
- All existing bases duck-type a common interface (`evaluate`, `derivative`,
  `evaluate_with_derivative`, `integrate`, `order`, `support`).  `InteractionBasis` adds
  `.x_basis` and `.y_basis` attributes and overrides `.evaluate` to return the
  Kronecker-product design row, fitting cleanly into the same slot.
- The model layer detects `isinstance(basis, InteractionBasis)` once at `fit()` time and
  routes to the tensor-product likelihood path.  This is one branch point rather than a
  parallel class hierarchy.
- A separate `Design`/`DesignSpec` object would require a new model parameter and duplicate
  the validation logic already in `ConditionalTransformationModel.__init__`.

**Consequence:** `InteractionBasis` requires *two* arrays at evaluation time (y and X),
so its `evaluate(y, X)` signature differs from the scalar-basis interface.  The model
layer owns this call convention; the basis itself documents it.

---

## Decision 2 — Parameter-Vector Layout

**Chosen:** Row-major (C-order) vectorisation.  `theta_` stores `vec_C(Θ)` of length
`p · q` where `Θ` is the `(p, q)` coefficient matrix (`p = y_basis.order + 1`,
`q = x_basis.order + 1`).  The shaped matrix is exposed as `model.Theta_`.

Specifically, `theta_[i * q + j] = Θ[i, j]`, matching NumPy's default `ravel`/`reshape`
convention and `np.kron` row ordering.

**What breaks for interaction models:**
- `theta_b` (the Bernstein sub-vector) and `beta` (the regression sub-vector) do not exist
  separately; `theta_` is the entire `vec(Θ)`.
- `coef_` returns `Theta_` as a 2-D array `(p, q)` rather than a 1-D vector.
- `n_free_params_` = `p * q` (no separate beta count).
- Existing code that reads `theta_[:p]` or `theta_[p:]` must not be called on an interaction
  model; these paths are guarded by the `isinstance(basis, InteractionBasis)` branch.

**Non-interaction models are unchanged:** the split `[theta_b | beta]` is untouched.

**Gradient computation:**
`∂h(y_i|x_i)/∂theta = a(y_i) ⊗ b(x_i)` — this is the `(1, p·q)` Jacobian row for
observation `i`, which is exactly the Kronecker-product evaluation row from
`InteractionBasis.evaluate(y_i, x_i)`.  The derivative w.r.t. `y` uses the
corresponding row from `InteractionBasis.derivative(y_i, x_i)`.

---

## Decision 3 — Monotonicity Strategy

**Chosen:** Closed-form column-wise constraints for the initial release.  Grid-based
evaluation is deferred.

**Rule:** `h(y|x) = (a(y) ⊗ b(x))ᵀ vec(Θ)` must be non-decreasing in `y` for every
fixed `x`.  When `b(x)` is **non-negative and a partition of unity** (every column of
the evaluated x-basis matrix is ≥ 0 and the row sums to 1), the condition reduces to

    D @ Θ[:, j] ≥ 0   for every column j = 0, …, q-1

where `D` is the standard `(p-1, p)` forward-difference matrix.  In flat form this
is the Kronecker inequality `(D ⊗ I_q) @ vec(Θ) ≥ 0`, giving `(p-1) · q` linear
constraints — cheap, exact, and handled by the existing `ConstraintMatrices` path.

**Supported x-bases (initial release):**
- `BernsteinBasis` — non-negative, partition-of-unity by construction.
- `OrdinalBasis` — non-negative, one-hot rows (partition of unity).
- `InterceptBasis` — single column of ones, trivially non-negative and a PoU.

**Unsupported x-bases (raise `ValueError` from `build_constraints`):**
- `PolynomialBasis` — can take negative values.
- `LegendreBasis` — oscillates; not non-negative.
- `LogBasis` — single column of potentially negative values.
- `LogBernsteinBasis` — not defined on a compact domain in the x direction.

`build_constraints` raises with a descriptive message naming the unsupported x-basis
so the user can substitute a supported one.

Grid-based evaluation (covering general x-bases) can be added in a later slice without
changing the public API; the only change would be lifting the `ValueError` for those types.

---

## Decision 4 — Likelihood-Path Generalisation

**Chosen:** Option (a) — generalise `_ll_*` and `_grad_*` to accept a precomputed `h`
vector and a Jacobian matrix `dh_dtheta`, rather than adding a parallel interaction path.

**Current shift path (unchanged for non-interaction models):**

```
h      = B @ theta_b  +  X @ beta            # shape (n,)
h_prime = dB @ theta_b                        # shape (n,)
J      = [B | X]                              # shape (n, p+q)  — the full Jacobian
```

**Tensor-product path:**

```
design  = InteractionBasis.evaluate(y, X)     # shape (n, p*q)
h       = design @ theta                      # shape (n,)
d_design = InteractionBasis.derivative(y, X)  # shape (n, p*q)
h_prime  = d_design @ theta                   # shape (n,)
J        = design                             # shape (n, p*q)
```

The private functions `_ll_none`, `_ll_right`, `_ll_left`, `_ll_interval` and their
gradient counterparts receive `(h, log_h_prime, J, ...)` — identical calling convention
for both paths.  A thin dispatcher in `negative_log_likelihood` branches on
`isinstance(basis, InteractionBasis)` to build `(h, log_h_prime, J)` before calling
the censoring-specific functions.

**Why option (a) over a parallel path:**
- The censoring dispatch is the stateful part; duplicating it doubles the surface area
  to maintain.
- `h` and `J` are the only quantities that change between the shift and interaction
  paths; everything downstream is identical.
- Analytical gradients remain exact: `∂ℓ/∂theta = Jᵀ s` where `s` is the score
  vector, same formula for both paths.

---

## Decision 5 — Backward Compatibility

**All non-interaction models are unaffected:**
- `theta_`, `coef_`, `predict`, `simulate`, `log_likelihood` semantics do not change.
- The parameter layout `[theta_b | beta]` and `n_free_params_` are unchanged.
- `build_constraints`, `build_constraint_matrices` accept the same arguments.
- No new required parameters are added to any existing class or function.

The `isinstance(basis, InteractionBasis)` branch added to `fit()` and
`negative_log_likelihood` is the only code path that diverges; it is dead code for
all models constructed without `InteractionBasis`.

---

## Stub files

Alongside this ADR, a `class InteractionBasis` skeleton is added to `mltpy/basis.py`
and re-exported from `mltpy/__init__.py`, raising `NotImplementedError` on all calls.
This lets downstream slices import the name without a follow-up rename.

---

## Consequences

- **Slice 2** implements `InteractionBasis.evaluate` and `derivative`.
- **Slice 3** extends `build_constraint_matrices` with the `(D ⊗ I_q)` rows.
- **Slice 4** wires the interaction path into `negative_log_likelihood`.
- **Slice 5** updates `ConditionalTransformationModel.fit` and exposes `Theta_` / `coef_`.
- Grid-based monotonicity (general x-bases) is explicitly out of scope for this release.