EDA – Beyond Knowledge Innovation

April 7, 2024April 7, 2024CEO 215 views

Parameter stratify from method train_test_split in scikit Learn

In the context of the train_test_split function in machine learning, the stratify parameter is used to ensure that the splitting process preserves the proportion of classes in the target variable. When you set stratify=y, where y is your target variable, the data is split in a way that maintains the distribution of classes in both…

March 12, 2024March 12, 2024CEO 196 views

CDF plot of Numerical columns

The provided code below generates a grid of subplots (dynamic rows and 2 columns) and plots cumulative distribution function (CDF) plots for numerical variables in a DataFrame (df).

March 7, 2024March 7, 2024CEO 175 views

Get a random sample from your dataset

To grab random sample from a dataset in Python, you can use the pandas library. Assuming your dataset is stored in a pandas DataFrame, you can use the sample method to randomly select rows. Here’s an example: In this example, n=5 specifies the number of rows to sample, and random_state is set to ensure reproducibility.

January 30, 2024January 30, 2024CEO 208 views

Univariate Analysis in EDA

nivariate exploration refers to the analysis of a single variable in isolation. In data analysis, univariate exploration involves examining the distribution, central tendency, and variability of a single variable without considering its relationship with other variables. Common techniques used in univariate exploration include: Univariate exploration is often the first step in data analysis, providing insights…

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30