Skip to content
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS
Close
Beyond Knowledge Innovation

Beyond Knowledge Innovation

Where Data Unveils Possibilities

  • Home
  • AI & ML Insights
  • Machine Learning
    • Supervised Learning
      • Introduction
      • Regression
      • Classification
    • Unsupervised Learning
      • Introduction
      • Clustering
      • Association
      • Dimensionality Reduction
    • Reinforcement Learning
    • Generative AI
  • Knowledge Base
    • Introduction To Python
    • Introduction To Data
    • Introduction to EDA
  • References
HomeKnowledge BaseWhat is Gaussian Distribution?
Knowledge Base Python

What is Gaussian Distribution?

February 9, 2024February 21, 2024CEO 181 views
A Gaussian distribution, also known as a normal distribution, is a continuous probability distribution that is symmetric around its mean, forming a bell-shaped curve. It is a fundamental concept in statistics and probability theory. The shape of the distribution is characterized by its mean (average) and standard deviation.

The probability density function (PDF) of a Gaussian distribution is given by the formula:

\(f(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \cdot e^{-\frac{(x – \mu)^2}{2\sigma^2}} \)

where:

  • ( f(x) ) is the probability density function,
  • ( x ) is the random variable,
  • ( μ ) is the mean (average) of the distribution,
  • ( σ ) is the standard deviation, and
  • ( e ) is the base of the natural logarithm.

Key properties of a Gaussian distribution include:

  1. Symmetry: The distribution is symmetric around its mean, with approximately 68.26% of the data falling within one standard deviation (<σ) of the mean (μ), 95.44% within two standard deviations (2σ), and 99.72% within three standard deviations (<3σ). The rest, 0.28% of the whole data, lies outside three standard deviations (>3σ) of the mean (μ), and this part of the data is considered as outliers.
  2. Bell-shaped curve: The probability density is highest at the mean and decreases as values move away from the mean in both directions.
  3. Central Limit Theorem: The sum (or average) of a large number of independent and identically distributed random variables, regardless of their original distribution, tends to follow a Gaussian distribution.

Gaussian distributions are widely used in various fields, including statistics, physics, finance, and machine learning, due to their mathematical properties and applicability to real-world phenomena.

You can draw a Gaussian distribution in Python using libraries such as numpy and matplotlib. Here’s a simple example:

import numpy as np
import matplotlib.pyplot as plt

# Generate data points for a Gaussian distribution
mean = 0  # Mean of the distribution
std_dev = 1  # Standard deviation of the distribution
num_points = 1000  # Number of data points

data = np.random.normal(mean, std_dev, num_points)

# Plot the histogram of the data
plt.hist(data, bins=30, density=True)

# Plot the probability density function (PDF) of the Gaussian distribution
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
pdf = (1/(std_dev * np.sqrt(2 * np.pi))) * np.exp(-(x - mean)**2 / (2 * std_dev**2))
plt.plot(x, pdf, color='red')

# Add labels and title
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.title('Gaussian Distribution')

# Show the plot
plt.show()

Let’s review the Box Plot of the Gaussian Distribution

In the above figure,

  • Minimum is the minimum value in the dataset,
  • Maximum is the maximum value in the dataset.

So the difference between the two tells us about the range of dataset.

  • The Median is the median (or center point), also called second quartile of the data.
  • Q1 is the first quartile of the data, i.e., to say 25% of the data lies between minimum and Q1.
  • Q3 is the third quartile of the data, i.e., to say 75% of the data lies between minimum and Q3.

The difference between Q3 and Q1 is called the Inter-Quartile Range or IQR.

IQR = Q3 - Q1

Any data point less than the Lower Bound or more than the Upper Bound is considered as an outlier.

  • Lower Bound = Q1 – 1.5 * IQR
  • Upper Bound = Q3 + 1.5 * IQR
distribution, guassian, normal

Post navigation

Previous Post
Previous post: How-to: give a specific sorting order to categorical values
Next Post
Next post: How-to: stack up two plots using the subplot function

You Might Also Like

No image
What is Uniform Distribution?
February 22, 2024 Comments Off on What is Uniform Distribution?
No image
What is Binomial Distribution?
February 21, 2024 Comments Off on What is Binomial Distribution?
  • Recent
  • Popular
  • Random
  • No image
    7 months ago Low-Rank Factorization
  • No image
    7 months ago Perturbation Test for a Regression Model
  • No image
    7 months ago Calibration Curve for Classification Models
  • No image
    March 15, 20240Single linkage hierarchical clustering
  • No image
    April 17, 20240XGBoost (eXtreme Gradient Boosting)
  • No image
    April 17, 20240Gradient Boosting
  • No image
    May 13, 2024LabelEncoder of scikit-learn library
  • No image
    February 22, 2024What is Uniform Distribution?
  • No image
    April 7, 2024Parameter stratify from method train_test_split in scikit…
  • Implementation (55)
    • EDA (4)
    • Neural Networks (10)
    • Supervised Learning (26)
      • Classification (17)
      • Linear Regression (8)
    • Unsupervised Learning (11)
      • Clustering (8)
      • Dimensionality Reduction (3)
  • Knowledge Base (44)
    • Python (27)
    • Statistics (6)
May 2025
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
« Oct    

We are on

FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS

Subscribe

© 2025 Beyond Knowledge Innovation
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS