generalized_additive_models.GAM#
- class generalized_additive_models.GAM(terms=None, *, distribution='normal', link='identity', fit_intercept=True, solver='pirls', max_iter=100, tol=0.0001, verbose=0)#
Initialize a Generalized Additive Model (GAM).
- Parameters:
terms (Term, TermList or list, optional) – The term(s) of the model. The argument can be a single term or a collection of terms. The features that the terms refer to must be present in the data set at fit and predict time. The default is None.
distribution (str or Distribution, optional) – The assumed distribution of the target variable. Look at the dict GAM.DISTRIBUTIONS for a list of available options. The default is “normal”.
link (str or Link, optional) – The assumed link function of the target variable. Look at the dict GAM.LINKS for a list of available options. The default is “identity”.
fit_intercept (bool, optional) – Whether or not to automatically add an intercept term to the terms. If an intercept is already present, then this setting has no effect. The default is True.
solver (str, optional) – Either ‘pirls’ or ‘lbfgsb’. ‘pirls’ stands for Penalized Iterated Reweighted Least Squares, which is a Newton routine that uses step-halving line search. ‘lbfgsb’ is the Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm from scipy.optimize.minimize. The default is “pirls”.
max_iter (int, optional) – Maximum number of iterations in the solver. The default is 100.
tol (float, optional) – Tolerance in the solver. The default is 0.0001.
verbose (int, optional) – Verbosity level. The higher the number, the more info is printed. The default is 0.
- Return type:
None.
Examples
>>> from generalized_additive_models import GAM, Spline, Categorical >>> from sklearn.datasets import load_diabetes >>> data = load_diabetes(as_frame=True) >>> df, y = data.data, data.target >>> gam = GAM(Spline("age") + Spline("bmi") + Spline("bp") + Categorical("sex")) >>> gam = gam.fit(df, y) >>> predictions = gam.predict(df) >>> for term in gam.terms: ... print(term, term.coef_) >>> float(gam.score(df, y)) 0.4412081401019129 >>> from sklearn.metrics import r2_score >>> r2_score(y_true=y, y_pred=predictions) 0.4412081401019129 >>> gam.terms["age"] Spline(feature='age') >>> gam.terms["age"].coef_[:3] array([ 0. , -11.86887791, -23.59686477])
- __init__(terms=None, *, distribution='normal', link='identity', fit_intercept=True, solver='pirls', max_iter=100, tol=0.0001, verbose=0)#
Methods
__init__([terms, distribution, link, ...])fit(X, y[, sample_weight])Fit model to data.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Predict the expected value \(\mu\) with the model.
residuals(X, y, *[, residuals, standardized])Compute a vector of residuals.
sample(mu[, size, random_state])Sample from the posterior predictive distribution.
score(X, y[, sample_weight])Proportion deviance explained (pseudo \(r^2\)).
set_fit_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_score_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
scoremethod.summary([file])Print a model summary.
Attributes
- DISTRIBUTIONS = {'bernoulli': <class 'generalized_additive_models.distributions.Bernoulli'>, 'binomial': <class 'generalized_additive_models.distributions.Binomial'>, 'exponential': <class 'generalized_additive_models.distributions.Exponential'>, 'gamma': <class 'generalized_additive_models.distributions.Gamma'>, 'inv_gauss': <class 'generalized_additive_models.distributions.InvGauss'>, 'normal': <class 'generalized_additive_models.distributions.Normal'>, 'poisson': <class 'generalized_additive_models.distributions.Poisson'>}#
- LINKS = {'cloglog': <class 'generalized_additive_models.links.CLogLogLink'>, 'identity': <class 'generalized_additive_models.links.Identity'>, 'inv_squared': <class 'generalized_additive_models.links.InvSquared'>, 'inverse': <class 'generalized_additive_models.links.Inverse'>, 'log': <class 'generalized_additive_models.links.Log'>, 'logit': <class 'generalized_additive_models.links.Logit'>, 'smoothlog': <class 'generalized_additive_models.links.SmoothLog'>, 'softplus': <class 'generalized_additive_models.links.Softplus'>}#
- fit(X, y, sample_weight=None)#
Fit model to data.
- Parameters:
X (np.ndarray or pd.DataFrame) – A dataset to fit to. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
y (np.ndarray or Series) – An array of target values.
sample_weight (np.ndarray, optional) – An array of sample weights. Sample weights [1, 3] is equal to repeating the second data point three times. The default is None.
- Returns:
Returns the instance.
- Return type:
Examples
>>> rng = np.random.default_rng(32) >>> X = rng.normal(size=(100, 1)) >>> y = np.sin(X).ravel() >>> gam = GAM(Spline(0)) >>> gam.fit(X, y) GAM(terms=TermList(data=[Spline(feature=0), Intercept()]))
- predict(X)#
Predict the expected value \(\mu\) with the model.
- Parameters:
X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
- Returns:
An array with predictions. Predictions are inverse_link(X @ coef).
- Return type:
np.ndarray
Examples
Create a data set where probability of y being 1 increases with X:
>>> rng = np.random.default_rng(32) >>> X = rng.normal(size=(99, 1)) >>> probs = (1 / (1 + np.exp(-X))).ravel() >>> y = rng.binomial(1, p=probs)
Fit a model and predict:
>>> gam = GAM(Linear(0), distribution="binomial", link="logit").fit(X, y) >>> X_new = np.array([-3, -2, -1, 0, 1, 2, 3]).reshape(-1, 1) >>> gam.predict(X_new).round(3) array([0.011, 0.045, 0.168, 0.465, 0.789, 0.942, 0.986])
- residuals(X, y, *, residuals='deviance', standardized=True)#
Compute a vector of residuals.
- Parameters:
X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
y (np.ndarray or Series) – An array of target values.
residuals (string, optional) – One of “response”, “pearson” or “deviance”. The default is “deviance”.
standardized (bool, optional) – Whether or not to standardize the residuals. The default is True.
- Returns:
residuals – An array of residuals.
- Return type:
np.ndarray
- sample(mu, size=None, random_state=None)#
Sample from the posterior predictive distribution.
- Parameters:
- Returns:
A numpy array of samples.
- Return type:
np.ndarray
Examples
>>> rng = np.random.default_rng(32) >>> X = np.ones((9999, 1)) >>> y = rng.normal(size=9999) * 10 >>> gam = GAM(terms=Intercept(), fit_intercept=False).fit(X, y) >>> float(gam._distribution.scale) 99.389... >>> gam.sample(mu=np.zeros(5), random_state=1).round(1) array([ 16.2, -6.1, -5.3, -10.7, 8.6]) >>> gam.sample(mu=np.zeros(5), random_state=3, size=(3, 5)).round(1) array([[ 17.8, 4.4, 1. , -18.6, -2.8], [ -3.5, -0.8, -6.3, -0.4, -4.8], [-13.1, 8.8, 8.8, 17. , 0.5]])
- score(X, y, sample_weight=None)#
Proportion deviance explained (pseudo \(r^2\)).
- Parameters:
X (np.ndarray or pd.DataFrame) – A dataset to predict on. Must be a np.ndarray of dimension 2 with shape (num_samples, num_features) or a pandas DataFrame. If the terms in the GAM refer to integer features, a np.ndarray must be passed. If the terms refer to string column names, a pandas DataFrame must be passed.
y (np.ndarray or Series) – An array of target values.
sample_weight (np.ndarray, optional) – An array of sample weights. Sample weights [1, 3] is equal to repeating the second data point three times. The default is None.
- Returns:
The pseudo r^2 score.
- Return type:
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GAM#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GAM#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- summary(file=None)#
Print a model summary.
- Parameters:
file (filehandle, optional) – A file handle to write to. The default is None, which maps to sys.stdout.
- Return type:
None.