What is PolynomialFeatures preprocessing technique?

PolynomialFeatures is a preprocessing technique used in machine learning, particularly in polynomial regression. It transforms an input feature matrix by adding new features that are polynomial combinations of the original features.

For example, if you have a feature (x), PolynomialFeatures can generate additional features like \( (x^2), (x^3) \), etc., up to a specified degree. This allows the model to capture nonlinear relationships between the features and the target variable.

Here’s a simple example using Python’s scikit-learn library:

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# Create a feature matrix with a single feature [1, 2, 3]
X = np.array([[1], [2], [3]])

# Create a PolynomialFeatures transformer with degree 2
poly = PolynomialFeatures(degree=2)

# Transform the original feature matrix to include polynomial features
X_poly = poly.fit_transform(X)

print(X_poly)

The transformed X_poly would look like:

[[1, 1, 1],
 [1, 2, 4],
 [1, 3, 9]]

In this example, the original feature was [1, 2, 3], and PolynomialFeatures added polynomial features up to degree 2, resulting in a new feature matrix with columns for constant term, linear term, and quadratic term.

This transformation allows a linear regression model to capture quadratic relationships between the input feature and the target variable. It’s particularly useful when the underlying relationship between the features and the target is not strictly linear.

The parameter interaction_only is a boolean parameter that controls whether to include only interaction features (products of distinct features), excluding polynomial features.

When interaction_only=True, the transformer generates only the interaction features, leaving out the polynomial features. This can be useful if you are specifically interested in modeling interactions between different features without introducing higher-degree polynomial terms.

In the example above, the transformed X_poly with interaction_only=True would look like:

[[1, 1],
 [1, 2],
 [1, 3]]

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

You Might Also Like