generalized_additive_models.Categorical#

class generalized_additive_models.Categorical(feature=None, penalty=1, by=None, handle_unknown='error', min_frequency=None, max_categories=None)#

A Categorial term.

Examples

A Categorial term is just a wrapper around sklearn’s OneHotEncoder. They are also called factor terms.

>>> X = np.array([1, 1, 2, 1, 2, 2]).reshape(-1, 1)
>>> categorical = Categorical(0).fit(X)
>>> categorical.transform(X)
array([[1., 0.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [0., 1.]])

Or, with a DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({"colors": ["red", "red", "blue", "yellow", "red"]})
>>> categorical = Categorical("colors")
>>> categorical.fit_transform(df)
array([[0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.]])

The number of coefficients equals the unique number of entries in the feature:

>>> categorical.num_coefficients
3

Each unique entry gets a penalty, penalizing the coefficients towards zero:

>>> categorical.penalty_matrix()
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

The categories assigned to each coefficient can be retrieved like so:

>>> categorical.categories_
['blue', 'red', 'yellow']

__init__(feature=None, penalty=1, by=None, handle_unknown='error', min_frequency=None, max_categories=None)#

Create a categorial term with a given penalty.

Examples

>>> from sklearn.datasets import load_diabetes
>>> df = load_diabetes(as_frame=True).data.iloc[:5, :]
>>> df.sex
0    0.050680
1   -0.044642
2    0.050680
3   -0.044642
4   -0.044642
Name: sex, dtype: float64
>>> categorical_term = Categorical("sex")
>>> categorical_term.fit_transform(df)
array([[0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [1., 0.]])
>>> import pandas as pd
>>> df = pd.DataFrame({'sex': ['M', 'F', 'M', 'F', 'F', 'Unknown']})
>>> categorical_term = Categorical("sex")
>>> categorical_term.fit_transform(df)
array([[0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 0., 1.]])
>>> categorical_term.categories_
['F', 'M', 'Unknown']

Methods

`__init__`([feature, penalty, by, ...])	Create a categorial term with a given penalty.
`fit`(X)	Fit to data.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`is_redundant_with_respect_to`(other)	Check if a Term is redundant with respect to another.
`penalty_matrix`()	Return the penalty matrix for the term.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	transform the input.

Attributes

`name`	Name of the term.
`num_coefficients`	Number of coefficients for the term.

fit(X)#

Fit to data.

Parameters:: X (np.ndarray or pd.DataFrame) – A dataset of shape (num_samples, num_features).

name = 'categorical'#: Name of the term.

property num_coefficients#: Number of coefficients for the term.

penalty_matrix()#: Return the penalty matrix for the term.

transform(X)#

transform the input.

Parameters:: X (np.ndarray) – An ndarray with 2 dimensions of shape (n_samples, n_features).
Returns:: X – An ndarray for the term.
Return type:: np.ndarray

Examples

>>> linear = Linear(1)
>>> X = np.eye(3) * 3
>>> linear.fit_transform(X)
array([[0.],
       [3.],
       [0.]])