ADR 0001 — Tensor-Product Interaction Basis#
Date: 2026-05-17
Status: Accepted
Deciders: RMKruse
Issue: #62 Interacting Terms: ADR for tensor-product API and monotonicity strategy
Parent: #27 Interacting Terms (Tensor Products)
Context#
Conditional Transformation Models allow the transformation h(y|x) to depend on covariates
either through a linear shift (h(y|x) = a(y)·θ + x·β, the current implementation) or
through a fully-interacting tensor-product structure
(h(y|x) = (a(y) ⊗ b(x))ᵀ vec(Θ)). The tensor-product form is the natural
generalisation that covers interaction effects between the response and the covariate.
This ADR resolves five open questions required before any tensor-product code is written.
Decision 1 — Public Surface#
Chosen: Single new class InteractionBasis(y_basis, x_basis) in mltpy.basis,
re-exported from mltpy.__init__. The existing MLT / ConditionalTransformationModel
classes accept this as the basis= argument; no new model class is introduced, and no
new model parameter (interacting=...) is added.
Rationale:
All existing bases duck-type a common interface (
evaluate,derivative,evaluate_with_derivative,integrate,order,support).InteractionBasisadds.x_basisand.y_basisattributes and overrides.evaluateto return the Kronecker-product design row, fitting cleanly into the same slot.The model layer detects
isinstance(basis, InteractionBasis)once atfit()time and routes to the tensor-product likelihood path. This is one branch point rather than a parallel class hierarchy.A separate
Design/DesignSpecobject would require a new model parameter and duplicate the validation logic already inConditionalTransformationModel.__init__.
Consequence: InteractionBasis requires two arrays at evaluation time (y and X),
so its evaluate(y, X) signature differs from the scalar-basis interface. The model
layer owns this call convention; the basis itself documents it.
Decision 2 — Parameter-Vector Layout#
Chosen: Row-major (C-order) vectorisation. theta_ stores vec_C(Θ) of length
p · q where Θ is the (p, q) coefficient matrix (p = y_basis.order + 1,
q = x_basis.order + 1). The shaped matrix is exposed as model.Theta_.
Specifically, theta_[i * q + j] = Θ[i, j], matching NumPy’s default ravel/reshape
convention and np.kron row ordering.
What breaks for interaction models:
theta_b(the Bernstein sub-vector) andbeta(the regression sub-vector) do not exist separately;theta_is the entirevec(Θ).coef_returnsTheta_as a 2-D array(p, q)rather than a 1-D vector.n_free_params_=p * q(no separate beta count).Existing code that reads
theta_[:p]ortheta_[p:]must not be called on an interaction model; these paths are guarded by theisinstance(basis, InteractionBasis)branch.
Non-interaction models are unchanged: the split [theta_b | beta] is untouched.
Gradient computation:
∂h(y_i|x_i)/∂theta = a(y_i) ⊗ b(x_i) — this is the (1, p·q) Jacobian row for
observation i, which is exactly the Kronecker-product evaluation row from
InteractionBasis.evaluate(y_i, x_i). The derivative w.r.t. y uses the
corresponding row from InteractionBasis.derivative(y_i, x_i).
Decision 3 — Monotonicity Strategy#
Chosen: Closed-form column-wise constraints for the initial release. Grid-based evaluation is deferred.
Rule: h(y|x) = (a(y) ⊗ b(x))ᵀ vec(Θ) must be non-decreasing in y for every
fixed x. When b(x) is non-negative and a partition of unity (every column of
the evaluated x-basis matrix is ≥ 0 and the row sums to 1), the condition reduces to
D @ Θ[:, j] ≥ 0 for every column j = 0, …, q-1
where D is the standard (p-1, p) forward-difference matrix. In flat form this
is the Kronecker inequality (D ⊗ I_q) @ vec(Θ) ≥ 0, giving (p-1) · q linear
constraints — cheap, exact, and handled by the existing ConstraintMatrices path.
Supported x-bases (initial release):
BernsteinBasis— non-negative, partition-of-unity by construction.OrdinalBasis— non-negative, one-hot rows (partition of unity).InterceptBasis— single column of ones, trivially non-negative and a PoU.
Unsupported x-bases (raise ValueError from build_constraints):
PolynomialBasis— can take negative values.LegendreBasis— oscillates; not non-negative.LogBasis— single column of potentially negative values.LogBernsteinBasis— not defined on a compact domain in the x direction.
build_constraints raises with a descriptive message naming the unsupported x-basis
so the user can substitute a supported one.
Grid-based evaluation (covering general x-bases) can be added in a later slice without
changing the public API; the only change would be lifting the ValueError for those types.
Decision 4 — Likelihood-Path Generalisation#
Chosen: Option (a) — generalise _ll_* and _grad_* to accept a precomputed h
vector and a Jacobian matrix dh_dtheta, rather than adding a parallel interaction path.
Current shift path (unchanged for non-interaction models):
h = B @ theta_b + X @ beta # shape (n,)
h_prime = dB @ theta_b # shape (n,)
J = [B | X] # shape (n, p+q) — the full Jacobian
Tensor-product path:
design = InteractionBasis.evaluate(y, X) # shape (n, p*q)
h = design @ theta # shape (n,)
d_design = InteractionBasis.derivative(y, X) # shape (n, p*q)
h_prime = d_design @ theta # shape (n,)
J = design # shape (n, p*q)
The private functions _ll_none, _ll_right, _ll_left, _ll_interval and their
gradient counterparts receive (h, log_h_prime, J, ...) — identical calling convention
for both paths. A thin dispatcher in negative_log_likelihood branches on
isinstance(basis, InteractionBasis) to build (h, log_h_prime, J) before calling
the censoring-specific functions.
Why option (a) over a parallel path:
The censoring dispatch is the stateful part; duplicating it doubles the surface area to maintain.
handJare the only quantities that change between the shift and interaction paths; everything downstream is identical.Analytical gradients remain exact:
∂ℓ/∂theta = Jᵀ swheresis the score vector, same formula for both paths.
Decision 5 — Backward Compatibility#
All non-interaction models are unaffected:
theta_,coef_,predict,simulate,log_likelihoodsemantics do not change.The parameter layout
[theta_b | beta]andn_free_params_are unchanged.build_constraints,build_constraint_matricesaccept the same arguments.No new required parameters are added to any existing class or function.
The isinstance(basis, InteractionBasis) branch added to fit() and
negative_log_likelihood is the only code path that diverges; it is dead code for
all models constructed without InteractionBasis.
Stub files#
Alongside this ADR, a class InteractionBasis skeleton is added to mltpy/basis.py
and re-exported from mltpy/__init__.py, raising NotImplementedError on all calls.
This lets downstream slices import the name without a follow-up rename.
Consequences#
Slice 2 implements
InteractionBasis.evaluateandderivative.Slice 3 extends
build_constraint_matriceswith the(D ⊗ I_q)rows.Slice 4 wires the interaction path into
negative_log_likelihood.Slice 5 updates
ConditionalTransformationModel.fitand exposesTheta_/coef_.Grid-based monotonicity (general x-bases) is explicitly out of scope for this release.