generalized_additive_models.GAM#

class generalized_additive_models.GAM(terms=None, *, distribution='normal', link='identity', fit_intercept=True, solver='pirls', max_iter=100, tol=0.0001, verbose=0)#

Initialize a Generalized Additive Model (GAM).

Parameters:

terms (Term, TermList or list, optional) – The term(s) of the model. The argument can be a single term or a collection of terms. The features that the terms refer to must be present in the data set at fit and predict time. The default is None.
distribution (str or Distribution, optional) – The assumed distribution of the target variable. Look at the dict GAM.DISTRIBUTIONS for a list of available options. The default is “normal”.
link (str or Link, optional) – The assumed link function of the target variable. Look at the dict GAM.LINKS for a list of available options. The default is “identity”.
fit_intercept (bool, optional) – Whether or not to automatically add an intercept term to the terms. If an intercept is already present, then this setting has no effect. The default is True.
solver (str, optional) – Either ‘pirls’ or ‘lbfgsb’. ‘pirls’ stands for Penalized Iterated Reweighted Least Squares, which is a Newton routine that uses step-halving line search. ‘lbfgsb’ is the Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm from scipy.optimize.minimize. The default is “pirls”.
max_iter (int, optional) – Maximum number of iterations in the solver. The default is 100.
tol (float, optional) – Tolerance in the solver. The default is 0.0001.
verbose (int, optional) – Verbosity level. The higher the number, the more info is printed. The default is 0.

Return type:

None.

Examples

>>> from generalized_additive_models import GAM, Spline, Categorical
>>> from sklearn.datasets import load_diabetes
>>> data = load_diabetes(as_frame=True)
>>> df, y = data.data, data.target
>>> gam = GAM(Spline("age") + Spline("bmi") + Spline("bp") + Categorical("sex"))
>>> gam = gam.fit(df, y)
>>> predictions = gam.predict(df)
>>> for term in gam.terms:
...     print(term, term.coef_) 
>>> float(gam.score(df, y))
0.4412081401019129
>>> from sklearn.metrics import r2_score
>>> r2_score(y_true=y, y_pred=predictions)
0.4412081401019129
>>> gam.terms["age"]
Spline(feature='age')
>>> gam.terms["age"].coef_[:3]
array([  0.        , -11.86887791, -23.59686477])

__init__(terms=None, *, distribution='normal', link='identity', fit_intercept=True, solver='pirls', max_iter=100, tol=0.0001, verbose=0)#

Methods

`__init__`([terms, distribution, link, ...])
`fit`(X, y[, sample_weight])	Fit model to data.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict the expected value $\mu$ with the model.
`residuals`(X, y, *[, residuals, standardized])	Compute a vector of residuals.
`sample`(mu[, size, random_state])	Sample from the posterior predictive distribution.
`score`(X, y[, sample_weight])	Proportion deviance explained (pseudo $r^2$).
`set_fit_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `fit` method.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `score` method.
`summary`([file])	Print a model summary.

Attributes

`DISTRIBUTIONS`
`LINKS`

DISTRIBUTIONS = {'bernoulli': <class 'generalized_additive_models.distributions.Bernoulli'>, 'binomial': <class 'generalized_additive_models.distributions.Binomial'>, 'exponential': <class 'generalized_additive_models.distributions.Exponential'>, 'gamma': <class 'generalized_additive_models.distributions.Gamma'>, 'inv_gauss': <class 'generalized_additive_models.distributions.InvGauss'>, 'normal': <class 'generalized_additive_models.distributions.Normal'>, 'poisson': <class 'generalized_additive_models.distributions.Poisson'>}#

LINKS = {'cloglog': <class 'generalized_additive_models.links.CLogLogLink'>, 'identity': <class 'generalized_additive_models.links.Identity'>, 'inv_squared': <class 'generalized_additive_models.links.InvSquared'>, 'inverse': <class 'generalized_additive_models.links.Inverse'>, 'log': <class 'generalized_additive_models.links.Log'>, 'logit': <class 'generalized_additive_models.links.Logit'>, 'smoothlog': <class 'generalized_additive_models.links.SmoothLog'>, 'softplus': <class 'generalized_additive_models.links.Softplus'>}#

fit(X, y, sample_weight=None)#

Fit model to data.

Parameters:

X (np.ndarray or pd.DataFrame) – A dataset to fit to. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
y (np.ndarray or Series) – An array of target values.
sample_weight (np.ndarray, optional) – An array of sample weights. Sample weights [1, 3] is equal to repeating the second data point three times. The default is None.

Returns:

Returns the instance.

Return type:

GAM

Examples

>>> rng = np.random.default_rng(32)
>>> X = rng.normal(size=(100, 1))
>>> y = np.sin(X).ravel()
>>> gam = GAM(Spline(0))
>>> gam.fit(X, y)
GAM(terms=TermList(data=[Spline(feature=0), Intercept()]))

predict(X)#

Predict the expected value $\mu$ with the model.

Parameters:: X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
Returns:: An array with predictions. Predictions are inverse_link(X @ coef).
Return type:: np.ndarray

Examples

Create a data set where probability of y being 1 increases with X:

>>> rng = np.random.default_rng(32)
>>> X = rng.normal(size=(99, 1))
>>> probs = (1 / (1 +  np.exp(-X))).ravel()
>>> y = rng.binomial(1, p=probs)

Fit a model and predict:

>>> gam = GAM(Linear(0), distribution="binomial", link="logit").fit(X, y)
>>> X_new = np.array([-3, -2, -1, 0, 1, 2, 3]).reshape(-1, 1)
>>> gam.predict(X_new).round(3)
array([0.011, 0.045, 0.168, 0.465, 0.789, 0.942, 0.986])

residuals(X, y, *, residuals='deviance', standardized=True)#

Compute a vector of residuals.

Parameters:

X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
y (np.ndarray or Series) – An array of target values.
residuals (string, optional) – One of “response”, “pearson” or “deviance”. The default is “deviance”.
standardized (bool, optional) – Whether or not to standardize the residuals. The default is True.

Returns:

residuals – An array of residuals.

Return type:

np.ndarray

sample(mu, size=None, random_state=None)#

Sample from the posterior predictive distribution.

Parameters:

mu (np.ndarray) – The expected value of the distribution.
size (int, optional) – Number of samples. The default is None, which means one sample.
random_state (int, optional) – Random state used to sample from the scipy distribution with reproducible results.

Returns:

A numpy array of samples.

Return type:

np.ndarray

Examples

>>> rng = np.random.default_rng(32)
>>> X = np.ones((9999, 1))
>>> y = rng.normal(size=9999) * 10
>>> gam = GAM(terms=Intercept(), fit_intercept=False).fit(X, y)
>>> float(gam._distribution.scale)
99.389...
>>> gam.sample(mu=np.zeros(5), random_state=1).round(1)
array([ 16.2,  -6.1,  -5.3, -10.7,   8.6])
>>> gam.sample(mu=np.zeros(5), random_state=3, size=(3, 5)).round(1)
array([[ 17.8,   4.4,   1. , -18.6,  -2.8],
       [ -3.5,  -0.8,  -6.3,  -0.4,  -4.8],
       [-13.1,   8.8,   8.8,  17. ,   0.5]])

score(X, y, sample_weight=None)#

Proportion deviance explained (pseudo $r^2$).

Parameters:

X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
y (np.ndarray or Series) – An array of target values.
sample_weight (np.ndarray, optional) – An array of sample weights. Sample weights [1, 3] is equal to repeating the second data point three times. The default is None.

Returns:

The pseudo r^2 score.

Return type:

float

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GAM#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GAM#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

summary(file=None)#

Print a model summary.

Parameters:: file (filehandle, optional) – A file handle to write to. The default is None, which maps to sys.stdout.
Return type:: None.