generalized_additive_models.Spline#
- class generalized_additive_models.Spline(feature=None, *, penalty=1, l2_penalty=0, by=None, num_splines=20, constraint=None, edges=None, degree=3, knots='uniform', extrapolation='linear')#
A Spline term.
- Parameters:
feature (int or str, optional) – The column index of the feature, or the name of the feature if the data set is a pandas DataFrame. The default is None.
penalty (float, optional) – A penalty term that penalizes the second derivative of the spline. If set high, the spline becomes linear (no second derivative). If set low, the spline becomes very wiggly and tends to overfit. The default is 1.
l2_penalty (float, optional) – A penalty term that penalizes the size of each spline coefficient. If set high, the spline becomes the zero function since all coefficients are regularized toward zero. If set low, the spline coefficients can be large. The default is 1.
An interaction effect with a numerical feature. The spline
> Spline(“age”, by=”income”)
models the multiplicative interaction :math:` ext{income} imes f( ext{age})`, meaning that the target is modeled as a smooth function of age, times a linear function of income. The default is None.
num_splines (int, optional) – The number of spline basis functions. The default is 20.
constraint (TYPE, optional) – A constraint for the spline. Must be one of {‘increasing-concave’, ‘convex’, ‘decreasing-concave’, ‘increasing’, ‘concave’, ‘decreasing’, ‘decreasing-convex’, ‘increasing-convex’} or None. The constraints do not hold for all extrapolations. The default is None.
edges (tuple, optional) –
A tuple with edges (low, high). For instance, to model a 24 hour periodic phenomenon, we could use
> Spline(“time”, edges=(0, 24), extrapolation=”periodic”)
The default is None, meaning that edges are inferred from the data.
degree (int, optional) – The spline degree. Degree 0 are box function, degree 1 are hat functions (also called tent functions), degree 2 are quadratic and degree 3 are cubic, and so forth.
knots (str, optional) – Where to place the knots, must be in {‘quantile’, ‘uniform’}.
extrapolation (str, optional) – Must be one of {‘continue’, ‘linear’, ‘error’, ‘constant’, ‘periodic’}.
- Return type:
None.
Examples
>>> spline = Spline(feature=0, num_splines=8) >>> spline.num_coefficients 8
Fitting and transforming creates a spline basis. The basis is given a sum-to-zero constraint over the data it is fitted on.
>>> import numpy as np >>> X = np.arange(27).reshape(9, 3) >>> Spline(0, num_splines=3, degree=0).fit_transform(X).round(2) array([[ 0.67, -0.33, -0.33], [ 0.67, -0.33, -0.33], [ 0.67, -0.33, -0.33], [-0.33, 0.67, -0.33], [-0.33, 0.67, -0.33], [-0.33, 0.67, -0.33], [-0.33, -0.33, 0.67], [-0.33, -0.33, 0.67], [-0.33, -0.33, 0.67]])
To recover the un-centered splines, we can add by the means learned during fitting:
>>> spline = Spline(0, num_splines=3, degree=0) >>> spline = spline.fit(X) >>> spline.transform(X) + spline.means_ array([[1., 0., 0.], [1., 0., 0.], [1., 0., 0.], [0., 1., 0.], [0., 1., 0.], [0., 1., 0.], [0., 0., 1.], [0., 0., 1.], [0., 0., 1.]])
Splines are given a penalty over the smoothness as measured by the second derivative. The second deriative is given by [1, -2, 1]:
>>> spline.penalty_matrix() array([[ 0., 0., 0.], [ 1., -2., 1.], [ 0., 0., 0.]])
The structure is easily seen on a Spline with num_splines set higher.
>>> Spline(0, num_splines=6).penalty_matrix() array([[ 0., 0., 0., 0., 0., 0.], [ 1., -2., 1., 0., 0., 0.], [ 0., 1., -2., 1., 0., 0.], [ 0., 0., 1., -2., 1., 0.], [ 0., 0., 0., 1., -2., 1.], [ 0., 0., 0., 0., 0., 0.]])
The level of penalization is given by the penalty parameter:
>>> Spline(0, num_splines=6, penalty=9).penalty_matrix() array([[ 0., 0., 0., 0., 0., 0.], [ 3., -6., 3., 0., 0., 0.], [ 0., 3., -6., 3., 0., 0.], [ 0., 0., 3., -6., 3., 0.], [ 0., 0., 0., 3., -6., 3.], [ 0., 0., 0., 0., 0., 0.]])
Linear functions are in the null space of the penalty:
>>> P = Spline(0, num_splines=6).penalty_matrix() >>> float(np.linalg.norm(P @ np.arange(6))**2) 0.0 >>> float(np.linalg.norm(P @ (np.arange(6) + 3))**2) 0.0
- __init__(feature=None, *, penalty=1, l2_penalty=0, by=None, num_splines=20, constraint=None, edges=None, degree=3, knots='uniform', extrapolation='linear')#
Examples
>>> Spline(0) Spline(feature=0) >>> Spline(1, penalty=0.1, by=2) Spline(by=2, feature=1, penalty=0.1)
>>> spline = Spline(0, num_splines=3, degree=1, extrapolation="linear") >>> X = np.linspace(0, 1, num=9).reshape(-1, 1) >>> spline.fit(X[:6, :]) Spline(degree=1, feature=0, num_splines=3) >>> spline.transform(X[:6, :]) + spline.means_ array([[1. , 0. , 0. ], [0.6, 0.4, 0. ], [0.2, 0.8, 0. ], [0. , 0.8, 0.2], [0. , 0.4, 0.6], [0. , 0. , 1. ]]) >>> spline.transform(X) + spline.means_ array([[ 1. , 0. , 0. ], [ 0.6, 0.4, 0. ], [ 0.2, 0.8, 0. ], [ 0. , 0.8, 0.2], [ 0. , 0.4, 0.6], [ 0. , 0. , 1. ], [ 0. , -0.4, 1.4], [ 0. , -0.8, 1.8], [ 0. , -1.2, 2.2]])
Methods
__init__([feature, penalty, l2_penalty, by, ...])fit(X)Fit to data.
fit_transform(X[, y])Fit to data, then transform it.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
is_redundant_with_respect_to(other)Check if a Term is redundant with respect to another.
Return the penalty matrix for the term.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Transform the input.
Attributes
Name of the term.
Number of coefficients for the term.
- fit(X)#
Fit to data.
- Parameters:
X (np.ndarray or pd.DataFrame) – A dataset of shape (num_samples, num_features).
- Returns:
X – An ndarray with a spline basis for the term.
- Return type:
np.ndarray
Examples
>>> spline = Spline(0, num_splines=3, degree=0) >>> X = np.linspace(0, 1, num=9).reshape(-1, 1) >>> spline = spline.fit(X) >>> spline.transform(X) + spline.means_ array([[1., 0., 0.], [1., 0., 0.], [1., 0., 0.], [0., 1., 0.], [0., 1., 0.], [0., 1., 0.], [0., 0., 1.], [0., 0., 1.], [0., 0., 1.]]) >>> X = np.vstack((np.linspace(0, 1, num=12), np.arange(12))).T >>> spline = Spline(0, num_splines=3, degree=1).fit(X) >>> (spline.transform(X) + spline.means_).round(1) array([[1. , 0. , 0. ], [0.8, 0.2, 0. ], [0.6, 0.4, 0. ], [0.5, 0.5, 0. ], [0.3, 0.7, 0. ], [0.1, 0.9, 0. ], [0. , 0.9, 0.1], [0. , 0.7, 0.3], [0. , 0.5, 0.5], [0. , 0.4, 0.6], [0. , 0.2, 0.8], [0. , 0. , 1. ]])
- name = 'spline'#
Name of the term.
- property num_coefficients#
Number of coefficients for the term.
- penalty_matrix()#
Return the penalty matrix for the term.
- transform(X)#
Transform the input.
- Parameters:
X (np.ndarray) – An ndarray with 2 dimensions of shape (n_samples, n_features).
- Returns:
X – An ndarray for the term representing the spline basis.
- Return type:
np.ndarray
Examples
>>> spline = Spline(0, num_splines=3, degree=0) >>> X = np.linspace(0, 1, num=9).reshape(-1, 1) >>> spline.fit_transform(X) * 3 + 1 array([[3., 0., 0.], [3., 0., 0.], [3., 0., 0.], [0., 3., 0.], [0., 3., 0.], [0., 3., 0.], [0., 0., 3.], [0., 0., 3.], [0., 0., 3.]])