Skip to content
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS
Close
Beyond Knowledge Innovation

Beyond Knowledge Innovation

Where Data Unveils Possibilities

  • Home
  • AI & ML Insights
  • Machine Learning
    • Supervised Learning
      • Introduction
      • Regression
      • Classification
    • Unsupervised Learning
      • Introduction
      • Clustering
      • Association
      • Dimensionality Reduction
    • Reinforcement Learning
    • Generative AI
  • Knowledge Base
    • Introduction To Python
    • Introduction To Data
    • Introduction to EDA
  • References
HomeKnowledge BasePythonHow-to: cap/clip outliers in a column
Python

How-to: cap/clip outliers in a column

February 6, 2024April 18, 2024CEO 195 views

To cap or clip outliers in a column, you can use the clip method in pandas. The clip method allows you to set a minimum and maximum threshold for the values in a DataFrame or a specific column. Here’s an example:

def treat_outliers(df, col):
    Q1 = df[col].quantile(0.25)  # 25th quantile
    Q3 = df[col].quantile(0.75)  # 75th quantile
    IQR = Q3 - Q1                # Inter Quantile Range (75th perentile - 25th percentile)
    lower_whisker = Q1 - 1.5 * IQR
    upper_whisker = Q3 + 1.5 * IQR

    df[col] = np.clip(df[col], lower_whisker, upper_whisker)

    return df

# treating outliers of a column
data = treat_outliers(data,'your column name')

Clipping is a simple method, and it’s important to consider the impact on your data and analysis. If you need a more sophisticated approach, you might want to explore other techniques for handling outliers, such as using z-scores, percentiles, or more advanced statistical methods.

Here is another example:

#Calculating top 5 values
data['total sulfur dioxide'].sort_values(ascending=False).head()
1081    289.0
1079    278.0
354     165.0
1244    160.0
651     155.0
Name: total sulfur dioxide, dtype: float64
#Capping the two extreme values
data['total sulfur dioxide']=data['total sulfur dioxide'].clip(upper=165)

The two rows that have total sulfur dioxide greater than 165 are now updated with 165.

cap, clean, clip, numpy, outlier, preprocessing

Post navigation

Previous Post
Previous post: How-to: When missing data is of type categorical
Next Post
Next post: How-to: give a specific sorting order to categorical values

You Might Also Like

No image
Standardizing features by StandardScaler
March 11, 2024 Comments Off on Standardizing features by StandardScaler
No image
NumPy function argmax
March 10, 2024 Comments Off on NumPy function argmax
No image
NumPy function argsort
March 10, 2024 Comments Off on NumPy function argsort
No image
One-Hot Encoding
February 29, 2024 Comments Off on One-Hot Encoding
No image
How-to: give a specific sorting order to…
February 7, 2024 Comments Off on How-to: give a specific sorting order to categorical values
  • Recent
  • Popular
  • Random
  • No image
    7 months ago Low-Rank Factorization
  • No image
    7 months ago Perturbation Test for a Regression Model
  • No image
    7 months ago Calibration Curve for Classification Models
  • No image
    March 15, 20240Single linkage hierarchical clustering
  • No image
    April 17, 20240XGBoost (eXtreme Gradient Boosting)
  • No image
    April 17, 20240Gradient Boosting
  • No image
    March 10, 2024NumPy function argmax
  • No image
    January 16, 2024How to create a smaller dataset for…
  • No image
    March 15, 2024Cophenetic coefficient
  • Implementation (55)
    • EDA (4)
    • Neural Networks (10)
    • Supervised Learning (26)
      • Classification (17)
      • Linear Regression (8)
    • Unsupervised Learning (11)
      • Clustering (8)
      • Dimensionality Reduction (3)
  • Knowledge Base (44)
    • Python (27)
    • Statistics (6)
May 2025
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
« Oct    

We are on

FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS

Subscribe

© 2025 Beyond Knowledge Innovation
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS