generalized_additive_models.GAM#

class generalized_additive_models.GAM(terms=None, *, distribution='normal', link='identity', fit_intercept=True, solver='pirls', max_iter=100, tol=0.0001, verbose=0)#

Initialize a Generalized Additive Model (GAM).

Parameters:
  • terms (Term, TermList or list, optional) – The term(s) of the model. The argument can be a single term or a collection of terms. The features that the terms refer to must be present in the data set at fit and predict time. The default is None.

  • distribution (str or Distribution, optional) – The assumed distribution of the target variable. Look at the dict GAM.DISTRIBUTIONS for a list of available options. The default is “normal”.

  • link (str or Link, optional) – The assumed link function of the target variable. Look at the dict GAM.LINKS for a list of available options. The default is “identity”.

  • fit_intercept (bool, optional) – Whether or not to automatically add an intercept term to the terms. If an intercept is already present, then this setting has no effect. The default is True.

  • solver (str, optional) – Either ‘pirls’ or ‘lbfgsb’. ‘pirls’ stands for Penalized Iterated Reweighted Least Squares, which is a Newton routine that uses step-halving line search. ‘lbfgsb’ is the Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm from scipy.optimize.minimize. The default is “pirls”.

  • max_iter (int, optional) – Maximum number of iterations in the solver. The default is 100.

  • tol (float, optional) – Tolerance in the solver. The default is 0.0001.

  • verbose (int, optional) – Verbosity level. The higher the number, the more info is printed. The default is 0.

Return type:

None.

Examples

>>> from generalized_additive_models import GAM, Spline, Categorical
>>> from sklearn.datasets import load_diabetes
>>> data = load_diabetes(as_frame=True)
>>> df, y = data.data, data.target
>>> gam = GAM(Spline("age") + Spline("bmi") + Spline("bp") + Categorical("sex"))
>>> gam = gam.fit(df, y)
>>> predictions = gam.predict(df)
>>> for term in gam.terms:
...     print(term, term.coef_) 
>>> float(gam.score(df, y))
0.4412081401019129
>>> from sklearn.metrics import r2_score
>>> r2_score(y_true=y, y_pred=predictions)
0.4412081401019129
>>> gam.terms["age"]
Spline(feature='age')
>>> gam.terms["age"].coef_[:3]
array([  0.        , -11.86887791, -23.59686477])
__init__(terms=None, *, distribution='normal', link='identity', fit_intercept=True, solver='pirls', max_iter=100, tol=0.0001, verbose=0)#

Methods

__init__([terms, distribution, link, ...])

fit(X, y[, sample_weight])

Fit model to data.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict the expected value \(\mu\) with the model.

residuals(X, y, *[, residuals, standardized])

Compute a vector of residuals.

sample(mu[, size, random_state])

Sample from the posterior predictive distribution.

score(X, y[, sample_weight])

Proportion deviance explained (pseudo \(r^2\)).

set_fit_request(*[, sample_weight])

Configure whether metadata should be requested to be passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Configure whether metadata should be requested to be passed to the score method.

summary([file])

Print a model summary.

Attributes

DISTRIBUTIONS = {'bernoulli': <class 'generalized_additive_models.distributions.Bernoulli'>, 'binomial': <class 'generalized_additive_models.distributions.Binomial'>, 'exponential': <class 'generalized_additive_models.distributions.Exponential'>, 'gamma': <class 'generalized_additive_models.distributions.Gamma'>, 'inv_gauss': <class 'generalized_additive_models.distributions.InvGauss'>, 'normal': <class 'generalized_additive_models.distributions.Normal'>, 'poisson': <class 'generalized_additive_models.distributions.Poisson'>}#
fit(X, y, sample_weight=None)#

Fit model to data.

Parameters:
  • X (np.ndarray or pd.DataFrame) – A dataset to fit to. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.

  • y (np.ndarray or Series) – An array of target values.

  • sample_weight (np.ndarray, optional) – An array of sample weights. Sample weights [1, 3] is equal to repeating the second data point three times. The default is None.

Returns:

Returns the instance.

Return type:

GAM

Examples

>>> rng = np.random.default_rng(32)
>>> X = rng.normal(size=(100, 1))
>>> y = np.sin(X).ravel()
>>> gam = GAM(Spline(0))
>>> gam.fit(X, y)
GAM(terms=TermList(data=[Spline(feature=0), Intercept()]))
predict(X)#

Predict the expected value \(\mu\) with the model.

Parameters:

X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.

Returns:

An array with predictions. Predictions are inverse_link(X @ coef).

Return type:

np.ndarray

Examples

Create a data set where probability of y being 1 increases with X:

>>> rng = np.random.default_rng(32)
>>> X = rng.normal(size=(99, 1))
>>> probs = (1 / (1 +  np.exp(-X))).ravel()
>>> y = rng.binomial(1, p=probs)

Fit a model and predict:

>>> gam = GAM(Linear(0), distribution="binomial", link="logit").fit(X, y)
>>> X_new = np.array([-3, -2, -1, 0, 1, 2, 3]).reshape(-1, 1)
>>> gam.predict(X_new).round(3)
array([0.011, 0.045, 0.168, 0.465, 0.789, 0.942, 0.986])
residuals(X, y, *, residuals='deviance', standardized=True)#

Compute a vector of residuals.

Parameters:
  • X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.

  • y (np.ndarray or Series) – An array of target values.

  • residuals (string, optional) – One of “response”, “pearson” or “deviance”. The default is “deviance”.

  • standardized (bool, optional) – Whether or not to standardize the residuals. The default is True.

Returns:

residuals – An array of residuals.

Return type:

np.ndarray

sample(mu, size=None, random_state=None)#

Sample from the posterior predictive distribution.

Parameters:
  • mu (np.ndarray) – The expected value of the distribution.

  • size (int, optional) – Number of samples. The default is None, which means one sample.

  • random_state (int, optional) – Random state used to sample from the scipy distribution with reproducible results.

Returns:

A numpy array of samples.

Return type:

np.ndarray

Examples

>>> rng = np.random.default_rng(32)
>>> X = np.ones((9999, 1))
>>> y = rng.normal(size=9999) * 10
>>> gam = GAM(terms=Intercept(), fit_intercept=False).fit(X, y)
>>> float(gam._distribution.scale)
99.389...
>>> gam.sample(mu=np.zeros(5), random_state=1).round(1)
array([ 16.2,  -6.1,  -5.3, -10.7,   8.6])
>>> gam.sample(mu=np.zeros(5), random_state=3, size=(3, 5)).round(1)
array([[ 17.8,   4.4,   1. , -18.6,  -2.8],
       [ -3.5,  -0.8,  -6.3,  -0.4,  -4.8],
       [-13.1,   8.8,   8.8,  17. ,   0.5]])
score(X, y, sample_weight=None)#

Proportion deviance explained (pseudo \(r^2\)).

Parameters:
  • X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.

  • y (np.ndarray or Series) – An array of target values.

  • sample_weight (np.ndarray, optional) – An array of sample weights. Sample weights [1, 3] is equal to repeating the second data point three times. The default is None.

Returns:

The pseudo r^2 score.

Return type:

float

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GAM#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GAM#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

summary(file=None)#

Print a model summary.

Parameters:

file (filehandle, optional) – A file handle to write to. The default is None, which maps to sys.stdout.

Return type:

None.