generalized_additive_models.Categorical#

class generalized_additive_models.Categorical(feature=None, penalty=1, by=None, handle_unknown='error', min_frequency=None, max_categories=None)#

A Categorial term.

Examples

A Categorial term is just a wrapper around sklearn’s OneHotEncoder. They are also called factor terms.

>>> X = np.array([1, 1, 2, 1, 2, 2]).reshape(-1, 1)
>>> categorical = Categorical(0).fit(X)
>>> categorical.transform(X)
array([[1., 0.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [0., 1.]])

Or, with a DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({"colors": ["red", "red", "blue", "yellow", "red"]})
>>> categorical = Categorical("colors")
>>> categorical.fit_transform(df)
array([[0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.]])

The number of coefficients equals the unique number of entries in the feature:

>>> categorical.num_coefficients
3

Each unique entry gets a penalty, penalizing the coefficients towards zero:

>>> categorical.penalty_matrix()
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

The categories assigned to each coefficient can be retrieved like so:

>>> categorical.categories_
['blue', 'red', 'yellow']
__init__(feature=None, penalty=1, by=None, handle_unknown='error', min_frequency=None, max_categories=None)#

Create a categorial term with a given penalty.

Examples

>>> from sklearn.datasets import load_diabetes
>>> df = load_diabetes(as_frame=True).data.iloc[:5, :]
>>> df.sex
0    0.050680
1   -0.044642
2    0.050680
3   -0.044642
4   -0.044642
Name: sex, dtype: float64
>>> categorical_term = Categorical("sex")
>>> categorical_term.fit_transform(df)
array([[0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [1., 0.]])
>>> import pandas as pd
>>> df = pd.DataFrame({'sex': ['M', 'F', 'M', 'F', 'F', 'Unknown']})
>>> categorical_term = Categorical("sex")
>>> categorical_term.fit_transform(df)
array([[0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 0., 1.]])
>>> categorical_term.categories_
['F', 'M', 'Unknown']

Methods

__init__([feature, penalty, by, ...])

Create a categorial term with a given penalty.

fit(X)

Fit to data.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

is_redundant_with_respect_to(other)

Check if a Term is redundant with respect to another.

penalty_matrix()

Return the penalty matrix for the term.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

transform the input.

Attributes

name

Name of the term.

num_coefficients

Number of coefficients for the term.

fit(X)#

Fit to data.

Parameters:

X (np.ndarray or pd.DataFrame) – A dataset of shape (num_samples, num_features).

name = 'categorical'#

Name of the term.

property num_coefficients#

Number of coefficients for the term.

penalty_matrix()#

Return the penalty matrix for the term.

transform(X)#

transform the input.

Parameters:

X (np.ndarray) – An ndarray with 2 dimensions of shape (n_samples, n_features).

Returns:

X – An ndarray for the term.

Return type:

np.ndarray

Examples

>>> linear = Linear(1)
>>> X = np.eye(3) * 3
>>> linear.fit_transform(X)
array([[0.],
       [3.],
       [0.]])