Handling missing values with SimpleImputer – Beyond Knowledge Innovation

SimpleImputer is a class in scikit-learn, a popular machine learning library in Python, used for handling missing values in datasets. It provides a simple strategy for imputing missing values, such as filling missing entries with the mean, median, most frequent value, or a constant.

Here’s a basic example of how you might use SimpleImputer:

from sklearn.impute import SimpleImputer
import numpy as np

# Example dataset with missing values
X = np.array([[1, 2, np.nan],
              [3, np.nan, 4],
              [np.nan, 5, 6]])

# Create a SimpleImputer instance with strategy 'mean'
imputer = SimpleImputer(strategy='mean')

# Fit the imputer to the data and transform it
X_imputed = imputer.fit_transform(X)

print(X_imputed)

This code will replace missing values in the dataset X with the mean of the respective columns. You can replace 'mean' with 'median', 'most_frequent', or 'constant' as per your requirement. Additionally, you can specify a constant value if you choose the 'constant' strategy.

Here is another example:

# impute the missing values with median
imp_median = SimpleImputer(missing_values=np.nan, strategy="median")

# fit the imputer on train data and transform the train data
X_train["income"] = imp_median.fit_transform(X_train[["income"]])

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

You Might Also Like