t-distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE, which stands for t-distributed Stochastic Neighbor Embedding, is a popular dimensionality reduction technique (of type Feature Extraction) used in machine learning and data visualization. It is particularly useful for visualizing high-dimensional data in a lower-dimensional space, typically two or three dimensions, while preserving the local structure of the data as much as possible. The…

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a widely used linear dimensionality reduction technique (of type Feature Extraction) used for reducing the dimensionality of datasets containing many correlated variables while preserving most of the variability in the data. Here’s how PCA works: Each of the “new” variables after PCA are all independent of one another. PCA has…

Unsupervised Learning Dimensionality Reduction – Feature Elimination vs Extraction

Feature Elimination and Feature Extraction are two common techniques used in dimensionality reduction, a process aimed at reducing the number of features (or dimensions) in a dataset while preserving the most important information. Both techniques are used to address the curse of dimensionality, improve computational efficiency, and potentially enhance model performance. However, they differ in…

Cophenetic coefficient

he cophenetic coefficient is a measure used to evaluate the quality of a hierarchical clustering solution. It quantifies how faithfully the hierarchical structure (dendrogram) preserves the original pairwise distances or dissimilarities between data points. Here’s how it works: A high cophenetic coefficient suggests that the hierarchical clustering solution accurately captures the underlying structure of the…

Complete linkage hierarchical clustering

omplete linkage hierarchical clustering is another method used in cluster analysis, like single linkage clustering, but with a different approach to determining the distance between clusters. In complete linkage clustering, the distance between two clusters is defined as the maximum distance between any two points in the two clusters. So, the distance between two clusters…

Single linkage hierarchical clustering

ingle linkage hierarchical clustering is a method used in cluster analysis to group similar data points into clusters based on their proximity or similarity. It is a bottom-up approach, starting with each data point as its own cluster and then iteratively merging the closest pairs of clusters until only one cluster remains. In single linkage…

Finding the optimal number of clusters (k) using Elbow Method

he elbow method is a technique used to find the optimal number of clusters (k) in a dataset for a clustering algorithm, such as k-means. The idea is to run the clustering algorithm for different values of k and plot the sum of squared distances (inertia) between data points and their assigned cluster centroids. The…

What is Silhouette Coefficient

he silhouette coefficient is a measure of how well-separated clusters are in a clustering analysis. It provides a way to assess the quality of clustering by evaluating both the cohesion within clusters and the separation between clusters. The silhouette coefficient ranges from -1 to 1, with higher values indicating better-defined clusters. Here’s how the silhouette…

What is Mahalanobis Distance

he Mahalanobis distance is a measure of the distance between a point and a distribution, taking into account the correlation between variables. It is often used in statistics and machine learning to identify outliers and to assess the dissimilarity between a data point and a distribution. The Mahalanobis distance is defined for a point (x)…

What is Jaccard Distance

accard distance is a measure of dissimilarity between two sets. It is calculated as the complement of the Jaccard similarity coefficient and is particularly useful when dealing with binary data or sets. The Jaccard similarity coefficient measures the proportion of shared elements between two sets, and the Jaccard distance is essentially the complement of this…