t-distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE, which stands for t-distributed Stochastic Neighbor Embedding, is a popular dimensionality reduction technique (of type Feature Extraction) used in machine learning and data visualization. It is particularly useful for visualizing high-dimensional data in a lower-dimensional space, typically two or three dimensions, while preserving the local structure of the data as much as possible.

The main idea behind t-SNE is to map high-dimensional data points to a lower-dimensional space in such a way that similar points in the high-dimensional space are represented as nearby points in the low-dimensional space, while dissimilar points are represented as distant points. This is achieved by modeling the similarity between data points in both the high-dimensional and low-dimensional spaces using probability distributions and minimizing the mismatch between them.

t-SNE is commonly used in exploratory data analysis, clustering, and visualization tasks, especially when dealing with complex and nonlinear relationships in the data. However, it’s important to note that t-SNE is computationally expensive and may not always preserve global structures accurately, especially in cases of very high-dimensional data. Additionally, t-SNE is sensitive to its hyperparameters, and different parameter settings can lead to different visualizations.

Example

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

sns.set_theme()

pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:,.2f}'.format)

from scipy.stats import zscore
from sklearn.manifold import TSNE

# X is the numeric columns
X_scaled = X.apply(zscore)

tsne = TSNE(n_components=2, random_state=1)
X_reduced = tsne.fit_transform(X_scaled)

df = pd.DataFrame(X_reduced, columns=['component1', 'component2'])

sns.scatterplot(x=df['component1'], y=df['component2'])

sns.scatterplot(x=df['component1'], y=df['component2'], hue=data['cyl'])

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Example

You Might Also Like