generalized_additive_models.Tensor#
- class generalized_additive_models.Tensor(splines, *, by=None)#
A Tensor term.
Examples
A Tensor is constructed from a list of Splines or a TermList with Splines:
>>> tensor = Tensor(splines=[Spline(0), Spline(1)]) >>> tensor Tensor(TermList([Spline(feature=0), Spline(feature=1)])) >>> Tensor(Spline("age") + Spline("bmi")) Tensor(TermList([Spline(feature='age'), Spline(feature='bmi')]))
The number of coefficients equals the product of each Splines coefficients:
>>> tensor = Tensor(Spline(0, num_splines=3) + Spline(1, num_splines=4)) >>> tensor.num_coefficients 12
Fitting and transforming creates a spline basis like so:
>>> X = np.array([[1, 1], ... [1, 2], ... [1, 3], ... [2, 1], ... [2, 2], ... [2, 3], ... [3, 1], ... [3, 2], ... [3, 3]]) >>> tensor = Tensor(Spline(0, num_splines=3, degree=0) + Spline(1, num_splines=3, degree=0)) >>> tensor.fit_transform(X) + tensor.means_ array([[1., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 1., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 1., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 1., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 1.]])
Penalties are given to neighboring coefficients. In this case the result is hard to decipher, but it checks out. The first row gives the penalty for the first coefficient, and so forth:
>>> tensor.penalty_matrix() array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., 0., 0., -2., -0., -0., 1., 0., 0.], [ 0., 1., 0., -0., -2., -0., 0., 1., 0.], [ 0., 0., 1., -0., -0., -2., 0., 0., 1.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., -2., 1., 0., -0., 0., 0., -0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., -0., 0., 1., -2., 1., 0., -0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., -0., 0., 0., -0., 0., 1., -2., 1.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Imagine a matrix of coefficients that looks like this: [beta_11, beta_12, beta_13] [beta_21, beta_22, beta_23] [beta_31, beta_32, beta_33] Beta_11 is the first coefficient when unpacked to a vector, with no penalty. Beta_12 is the second coefficient when unpacked to a vector, with a penalty relating it to beta_11 and beta_13. The pattern continues and e.g. beta_22 is related to four other coefficients, namely beta_12, beta_21, beta_23 and beta_32.
The level of penalization is given by the penalty parameter for each Spline, and is multiplied together after taking square roots. Penalties can vary in each dimension:
>>> spline1 = Spline(0, num_splines=3, degree=0, penalty=9) >>> spline2 = Spline(1, num_splines=3, degree=0, penalty=1) >>> tensor = Tensor(spline1 + spline2) >>> tensor.penalty_matrix() array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 3., 0., 0., -6., -0., -0., 3., 0., 0.], [ 0., 3., 0., -0., -6., -0., 0., 3., 0.], [ 0., 0., 3., -0., -0., -6., 0., 0., 3.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., -2., 1., 0., -0., 0., 0., -0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., -0., 0., 1., -2., 1., 0., -0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., -0., 0., 0., -0., 0., 1., -2., 1.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Linear functions of two variables are in the null space of the penalty
>>> coefs = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> P = tensor.penalty_matrix() >>> float(np.linalg.norm(P @ coefs)**2) 0.0
Categorical terms are also allowed.
>>> import pandas as pd >>> df = pd.DataFrame({'color': list('rgbgbrr'), 'grade':list('AAABBCC')}) >>> Categorical('color').fit_transform(df) array([[0., 0., 1.], [0., 1., 0.], [1., 0., 0.], [0., 1., 0.], [1., 0., 0.], [0., 0., 1.], [0., 0., 1.]]) >>> te = Tensor(Categorical('color') + Categorical('grade', penalty=4)) >>> te.fit_transform(df).astype(int) array([[0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 1]]) >>> te.penalty_matrix().astype(int) array([[1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1], [2, 0, 0, 0, 0, 0, 0, 0, 0], [0, 2, 0, 0, 0, 0, 0, 0, 0], [0, 0, 2, 0, 0, 0, 0, 0, 0], [0, 0, 0, 2, 0, 0, 0, 0, 0], [0, 0, 0, 0, 2, 0, 0, 0, 0], [0, 0, 0, 0, 0, 2, 0, 0, 0], [0, 0, 0, 0, 0, 0, 2, 0, 0], [0, 0, 0, 0, 0, 0, 0, 2, 0], [0, 0, 0, 0, 0, 0, 0, 0, 2]])
Categoricals can be combined with Splines.
>>> df = pd.DataFrame({'cat': [0]*4 + [1]*4, 'x': [1, 2, 3, 4] * 2}) >>> te = Tensor([Categorical('cat'), Spline('x', num_splines=4, degree=0)]) >>> te.fit_transform(df) array([[ 0.875, -0.125, -0.125, -0.125, -0.125, -0.125, -0.125, -0.125], [-0.125, 0.875, -0.125, -0.125, -0.125, -0.125, -0.125, -0.125], [-0.125, -0.125, 0.875, -0.125, -0.125, -0.125, -0.125, -0.125], [-0.125, -0.125, -0.125, 0.875, -0.125, -0.125, -0.125, -0.125], [-0.125, -0.125, -0.125, -0.125, 0.875, -0.125, -0.125, -0.125], [-0.125, -0.125, -0.125, -0.125, -0.125, 0.875, -0.125, -0.125], [-0.125, -0.125, -0.125, -0.125, -0.125, -0.125, 0.875, -0.125], [-0.125, -0.125, -0.125, -0.125, -0.125, -0.125, -0.125, 0.875]])
- __init__(splines, *, by=None)#
- Parameters:
splines (TYPE) – DESCRIPTION.
- Return type:
None.
Examples
>>> tensor = Tensor([Spline(0), Spline(1)]) >>> for spline in tensor: ... print(spline) Spline(feature=0) Spline(feature=1)
Methods
__init__(splines, *[, by])fit(X)Fit to data.
fit_transform(X[, y])Fit to data, then transform it.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
is_redundant_with_respect_to(other)Check if a Term is redundant with respect to another.
Build the penaltry matrix.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Transform the input.
Attributes
Name of the term.
Number of coefficients for the term.
- feature = None#
- fit(X)#
Fit to data.
- Parameters:
X (np.ndarray or pd.DataFrame) – A dataset of shape (num_samples, num_features).
- get_params(deep=True)#
Get parameters for this estimator.
- name = 'tensor'#
Name of the term.
- property num_coefficients#
Number of coefficients for the term.
- penalty_matrix()#
Build the penaltry matrix.
builds the GAM block-diagonal penalty matrix in quadratic form out of penalty matrices specified for each feature.
each feature penalty matrix is multiplied by a lambda for that feature.
so for m features: P = block_diag[lam0 * P0, lam1 * P1, lam2 * P2, … , lamm * Pm]
- Returns:
Penaltry matrix.
- Return type:
np.ndarray
Examples
The coefficients are imagined to be structured as [[b_11, b_12, b_13, b14],
[b_21, b_22, b_23, b24], [b_31, b_32, b_33, b34]]
and .ravel()’ed into a vector of [b_11, b_12, b_13, b_14, b_21, b_22, …] The example below shows a penalty matrix:
>>> spline1 = Spline(0, num_splines=3, penalty=1) >>> spline2 = Spline(1, num_splines=4, penalty=1) >>> Tensor([spline1, spline2]).penalty_matrix().astype(int) array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 1, 0, 0, 0, -2, 0, 0, 0, 1, 0, 0, 0], [ 0, 1, 0, 0, 0, -2, 0, 0, 0, 1, 0, 0], [ 0, 0, 1, 0, 0, 0, -2, 0, 0, 0, 1, 0], [ 0, 0, 0, 1, 0, 0, 0, -2, 0, 0, 0, 1], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 1, -2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 1, -2, 1, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 1, -2, 1, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 1, -2, 1, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 1, -2, 1, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -2, 1], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
Examples
Setting with a shallow copy:
>>> tensor = Tensor([Spline(0), Spline(1)]) >>> params_shallow = tensor.get_params(deep=False) >>> params_shallow {'splines': TermList([Spline(feature=0), Spline(feature=1)])} >>> params_changed = {'splines': [Spline(feature=0), Spline(feature=99)]} >>> new_tensor = tensor.set_params(**params_changed) >>> new_tensor Tensor(TermList([Spline(feature=0), Spline(feature=99)])) >>> tensor Tensor(TermList([Spline(feature=0), Spline(feature=99)]))
Setting with a deep copy:
>>> terms = TermList([Linear(0), Intercept()]) >>> params_deep = terms.get_params(deep=True) >>> params_deep {'0__by': None, '0__constraint': None, '0__feature': 0, '0__penalty': 1, '0': Linear(feature=0), '1': Intercept()} >>> params_new = {'0__by': None, '0__feature': 2, '0__penalty': 2, '0': Linear(feature=0), '1': Intercept()} >>> new_terms = terms.set_params(**params_new) >>> new_terms TermList([Linear(feature=2, penalty=2), Intercept()]) >>> terms TermList([Linear(feature=2, penalty=2), Intercept()])
Setting on all terms in a Termlist:
>>> tensor = Tensor([Spline(0), Spline(1)]) >>> tensor.set_params(penalty=7) Tensor(TermList([Spline(feature=0, penalty=7), Spline(feature=1, penalty=7)]))
- transform(X)#
Transform the input.