Skip to content
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS
Close
Beyond Knowledge Innovation

Beyond Knowledge Innovation

Where Data Unveils Possibilities

  • Home
  • AI & ML Insights
  • Machine Learning
    • Supervised Learning
      • Introduction
      • Regression
      • Classification
    • Unsupervised Learning
      • Introduction
      • Clustering
      • Association
      • Dimensionality Reduction
    • Reinforcement Learning
    • Generative AI
  • Knowledge Base
    • Introduction To Python
    • Introduction To Data
    • Introduction to EDA
  • References
HomeKnowledge BasePythonHandling missing values with SimpleImputer
Python

Handling missing values with SimpleImputer

April 24, 2024April 24, 2024CEO 178 views

SimpleImputer is a class in scikit-learn, a popular machine learning library in Python, used for handling missing values in datasets. It provides a simple strategy for imputing missing values, such as filling missing entries with the mean, median, most frequent value, or a constant.

Here’s a basic example of how you might use SimpleImputer:

from sklearn.impute import SimpleImputer
import numpy as np

# Example dataset with missing values
X = np.array([[1, 2, np.nan],
              [3, np.nan, 4],
              [np.nan, 5, 6]])

# Create a SimpleImputer instance with strategy 'mean'
imputer = SimpleImputer(strategy='mean')

# Fit the imputer to the data and transform it
X_imputed = imputer.fit_transform(X)

print(X_imputed)

This code will replace missing values in the dataset X with the mean of the respective columns. You can replace 'mean' with 'median', 'most_frequent', or 'constant' as per your requirement. Additionally, you can specify a constant value if you choose the 'constant' strategy.

Here is another example:

# impute the missing values with median
imp_median = SimpleImputer(missing_values=np.nan, strategy="median")

# fit the imputer on train data and transform the train data
X_train["income"] = imp_median.fit_transform(X_train[["income"]])
imputer, missing data

Post navigation

Previous Post
Previous post: Undersampling Technique – Tomek Links
Next Post
Next post: Get available Hyperparameters

You Might Also Like

No image
How-to: When missing data is of type…
February 6, 2024 Comments Off on How-to: When missing data is of type categorical
No image
Handling missing data in a dataset
January 16, 2024 Comments Off on Handling missing data in a dataset
No image
Finding missing data in a dataset
January 16, 2024 Comments Off on Finding missing data in a dataset
  • Recent
  • Popular
  • Random
  • No image
    7 months ago Low-Rank Factorization
  • No image
    7 months ago Perturbation Test for a Regression Model
  • No image
    7 months ago Calibration Curve for Classification Models
  • No image
    March 15, 20240Single linkage hierarchical clustering
  • No image
    April 17, 20240XGBoost (eXtreme Gradient Boosting)
  • No image
    April 17, 20240Gradient Boosting
  • No image
    March 15, 2024Principal Component Analysis (PCA)
  • No image
    June 2, 2024Building a CNN model for Fashion MNIST…
  • No image
    May 5, 2024Multi-Layer Perceptron (MLP) in artificial neural network
  • Implementation (55)
    • EDA (4)
    • Neural Networks (10)
    • Supervised Learning (26)
      • Classification (17)
      • Linear Regression (8)
    • Unsupervised Learning (11)
      • Clustering (8)
      • Dimensionality Reduction (3)
  • Knowledge Base (44)
    • Python (27)
    • Statistics (6)
May 2025
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
« Oct    

We are on

FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS

Subscribe

© 2025 Beyond Knowledge Innovation
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS