generalized_additive_models.Categorical#
- class generalized_additive_models.Categorical(feature=None, penalty=1, by=None, handle_unknown='error', min_frequency=None, max_categories=None)#
A Categorial term.
Examples
A Categorial term is just a wrapper around sklearn’s OneHotEncoder. They are also called factor terms.
>>> X = np.array([1, 1, 2, 1, 2, 2]).reshape(-1, 1) >>> categorical = Categorical(0).fit(X) >>> categorical.transform(X) array([[1., 0.], [1., 0.], [0., 1.], [1., 0.], [0., 1.], [0., 1.]])
Or, with a DataFrame:
>>> import pandas as pd >>> df = pd.DataFrame({"colors": ["red", "red", "blue", "yellow", "red"]}) >>> categorical = Categorical("colors") >>> categorical.fit_transform(df) array([[0., 1., 0.], [0., 1., 0.], [1., 0., 0.], [0., 0., 1.], [0., 1., 0.]])
The number of coefficients equals the unique number of entries in the feature:
>>> categorical.num_coefficients 3
Each unique entry gets a penalty, penalizing the coefficients towards zero:
>>> categorical.penalty_matrix() array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
The categories assigned to each coefficient can be retrieved like so:
>>> categorical.categories_ ['blue', 'red', 'yellow']
- __init__(feature=None, penalty=1, by=None, handle_unknown='error', min_frequency=None, max_categories=None)#
Create a categorial term with a given penalty.
Examples
>>> from sklearn.datasets import load_diabetes >>> df = load_diabetes(as_frame=True).data.iloc[:5, :] >>> df.sex 0 0.050680 1 -0.044642 2 0.050680 3 -0.044642 4 -0.044642 Name: sex, dtype: float64 >>> categorical_term = Categorical("sex") >>> categorical_term.fit_transform(df) array([[0., 1.], [1., 0.], [0., 1.], [1., 0.], [1., 0.]]) >>> import pandas as pd >>> df = pd.DataFrame({'sex': ['M', 'F', 'M', 'F', 'F', 'Unknown']}) >>> categorical_term = Categorical("sex") >>> categorical_term.fit_transform(df) array([[0., 1., 0.], [1., 0., 0.], [0., 1., 0.], [1., 0., 0.], [1., 0., 0.], [0., 0., 1.]]) >>> categorical_term.categories_ ['F', 'M', 'Unknown']
Methods
__init__([feature, penalty, by, ...])Create a categorial term with a given penalty.
fit(X)Fit to data.
fit_transform(X[, y])Fit to data, then transform it.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
is_redundant_with_respect_to(other)Check if a Term is redundant with respect to another.
Return the penalty matrix for the term.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)transform the input.
Attributes
Name of the term.
Number of coefficients for the term.
- fit(X)#
Fit to data.
- Parameters:
X (np.ndarray or pd.DataFrame) – A dataset of shape (num_samples, num_features).
- name = 'categorical'#
Name of the term.
- property num_coefficients#
Number of coefficients for the term.
- penalty_matrix()#
Return the penalty matrix for the term.
- transform(X)#
transform the input.
- Parameters:
X (np.ndarray) – An ndarray with 2 dimensions of shape (n_samples, n_features).
- Returns:
X – An ndarray for the term.
- Return type:
np.ndarray
Examples
>>> linear = Linear(1) >>> X = np.eye(3) * 3 >>> linear.fit_transform(X) array([[0.], [3.], [0.]])