Skip to content
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS
Close
Beyond Knowledge Innovation

Beyond Knowledge Innovation

Where Data Unveils Possibilities

  • Home
  • AI & ML Insights
  • Machine Learning
    • Supervised Learning
      • Introduction
      • Regression
      • Classification
    • Unsupervised Learning
      • Introduction
      • Clustering
      • Association
      • Dimensionality Reduction
    • Reinforcement Learning
    • Generative AI
  • Knowledge Base
    • Introduction To Python
    • Introduction To Data
    • Introduction to EDA
  • References
HomeImplementationUnsupervised LearningClusteringFinding the optimal number of clusters (k) using…
Clustering

Finding the optimal number of clusters (k) using Elbow Method

March 11, 2024March 20, 2024CEO 222 views
The elbow method is a technique used to find the optimal number of clusters (k) in a dataset for a clustering algorithm, such as k-means. The idea is to run the clustering algorithm for different values of k and plot the sum of squared distances (inertia) between data points and their assigned cluster centroids. The point where the rate of decrease in inertia sharply changes is considered the “elbow” of the curve, suggesting the optimal number of clusters.

Here are the steps involved in the elbow method:

  1. Run the clustering algorithm for different values of k:
    Apply the clustering algorithm (e.g., k-means) to the dataset for a range of k values.
  2. Compute the sum of squared distances (inertia):
    For each value of k, calculate the sum of squared distances between each data point and its assigned cluster centroid. This is often referred to as the “inertia” or “within-cluster sum of squares.”
  3. Plot the results:
    Create a plot where the x-axis represents the number of clusters (k), and the y-axis represents the corresponding inertia values. The plot typically looks like an arm, and the “elbow” point is where the rate of decrease in inertia starts to slow down.
  4. Identify the elbow point:
    The optimal number of clusters is often associated with the “elbow” of the curve, where adding more clusters does not significantly reduce the inertia. The elbow point represents a balance between having enough clusters to capture the data’s structure and avoiding overfitting.

Here’s a simple example using the elbow method with scikit-learn’s k-means and Matplotlib:

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


data = np.array([[1, 2], [3, 4], [5, 6], [7,8], [9,10], [11, 12], [13,14], [15,16], [17,18], [19, 20],[1,3], [1,4], [1,5], [1, 6]])

# Instantiate the StandardScaler
scaler = StandardScaler()

# Fit and transform the data
data_scaled = scaler.fit_transform(data)

# fitting multiple k-means algorithms and storing the values in an empty list
sil_score= []
for cluster in range(1,10):
    model = KMeans(n_clusters = cluster, init='k-means++')
    model.fit(data_scaled)
    sil_score.append(model.inertia_)

# converting the results into a dataframe and plotting them
df = pd.DataFrame({'Cluster':range(1,10), 'SSE':sil_score})
plt.figure(figsize=(12,6))
plt.plot(df['Cluster'], df['SSE'], marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()

Looking at the above elbow curve, we can choose any number of clusters between 3 to 6.

Let’s set the number of clusters to 5 and fit the model:

# k means using 5 clusters and k-means++ initialization
kmeans = KMeans(n_clusters = 5, init='k-means++')
kmeans.fit(data_scaled)
pred = kmeans.predict(data_scaled)

# value count of points in each of the above-formed clusters
frame = pd.DataFrame(data_scaled)
frame['cluster'] = pred
frame['cluster'].value_counts()
1    6
3    2
0    2
2    2
4    2
Name: cluster, dtype: int64

In the resulting plot, the elbow point corresponds to the optimal number of clusters. Keep in mind that the elbow method is a heuristic, and the interpretation may not always be clear-cut, especially if the data does not exhibit a clear elbow. It is recommended to combine the elbow method with other evaluation metrics (e.g., silhouette score) for a more comprehensive analysis.

elbow, k-means, standardscaler, unsupervised

Post navigation

Previous Post
Previous post: Standardizing features by StandardScaler
Next Post
Next post: CDF plot of Numerical columns

You Might Also Like

No image
t-distributed Stochastic Neighbor Embedding (t-SNE)
March 17, 2024 Comments Off on t-distributed Stochastic Neighbor Embedding (t-SNE)
No image
Principal Component Analysis (PCA)
March 15, 2024 Comments Off on Principal Component Analysis (PCA)
No image
Unsupervised Learning Dimensionality Reduction – Feature Elimination…
March 15, 2024 Comments Off on Unsupervised Learning Dimensionality Reduction – Feature Elimination vs Extraction
No image
Complete linkage hierarchical clustering
March 15, 2024 Comments Off on Complete linkage hierarchical clustering
No image
Single linkage hierarchical clustering
March 15, 2024 Comments Off on Single linkage hierarchical clustering
  • Recent
  • Popular
  • Random
  • No image
    7 months ago Low-Rank Factorization
  • No image
    7 months ago Perturbation Test for a Regression Model
  • No image
    7 months ago Calibration Curve for Classification Models
  • No image
    March 15, 20240Single linkage hierarchical clustering
  • No image
    April 17, 20240XGBoost (eXtreme Gradient Boosting)
  • No image
    April 17, 20240Gradient Boosting
  • No image
    February 28, 2024Linear regression model coefficients
  • No image
    January 16, 2024Improve model with hyperparameters
  • No image
    October 21, 2024Low-Rank Factorization
  • Implementation (55)
    • EDA (4)
    • Neural Networks (10)
    • Supervised Learning (26)
      • Classification (17)
      • Linear Regression (8)
    • Unsupervised Learning (11)
      • Clustering (8)
      • Dimensionality Reduction (3)
  • Knowledge Base (44)
    • Python (27)
    • Statistics (6)
May 2025
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
« Oct    

We are on

FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS

Subscribe

© 2025 Beyond Knowledge Innovation
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS