2024 – Page 5 – Beyond Knowledge Innovation

March 10, 2024March 10, 2024CEO 191 views

Choosing the right estimator

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which…

March 10, 2024March 10, 2024CEO 212 views

What is Logistic Regression?

ogistic Regression is a statistical method used for binary classification tasks, where the outcome variable is categorical and has two classes. Despite its name, it is used for classification rather than regression. The logistic regression algorithm models the probability that a given input belongs to a particular class. The logistic regression model applies the logistic…

March 10, 2024March 10, 2024CEO 167 views

NumPy function argmax

np.argmax is a NumPy function that returns the indices of the maximum values along a specified axis in an array. If the input array is multi-dimensional, you can specify the axis along which the maximum values are computed. Here’s a simple example: Output: In this example, np.argmax(arr) returns the index (position) of the maximum value…

March 10, 2024March 10, 2024CEO 229 views

Parameter cv in GridSearchCV

In scikit-learn’s GridSearchCV (Grid Search Cross Validation), the parameter cv stands for “cross-validation.” It determines the cross-validation splitting strategy to be used when evaluating the performance of a machine learning model. When cv is set to an integer (e.g., cv=5), it represents the number of folds in a (Stratified) K-Fold cross-validation. For example, cv=5 means…

March 10, 2024March 10, 2024CEO 184 views

NumPy function argsort

np.argsort is a NumPy function that returns the indices that would sort an array along a specified axis. It performs an indirect sort on the input array and returns an array of indices that represent the sorted order of the elements. The returned indices can be used to construct a sorted version of the input…

March 8, 2024March 8, 2024CEO 218 views

Post-pruning Decision Tree with Cost Complexity Parameter ccp_alpha

Now put the printed best_model here below and run:

March 8, 2024March 8, 2024CEO 182 views

Pre-pruning Decision Tree – GridSearch for Hyperparameter tuning

Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters. It is an exhaustive search that is performed on the specific parameter values of a model. The parameters of the estimator/model used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

March 8, 2024March 8, 2024CEO 151 views

Pre-pruning Decision Tree – depth restricted

In general, the deeper you allow your tree to grow, the more complex your model will become because you will have more splits and it captures more information about the data and this is one of the root causes of overfitting. We can limit the tree with max_depth of tree:

March 7, 2024March 7, 2024CEO 233 views

Feature Importance in Decision Tree

In scikit-learn, the feature_importances_ attribute is associated with tree-based models, such as Decision Trees, Random Forests, and Gradient Boosted Trees. This attribute provides a way to assess the importance of each feature (or variable) in making predictions with the trained model. When you train a tree-based model, the algorithm makes decisions at each node based…

March 7, 2024March 7, 2024CEO 178 views

Visualizing the Decision Tree

To visualize a decision tree in scikit-learn, you can use the plot_tree function from the sklearn.tree module. This function allows you to generate a visual representation of the decision tree. Here’s a simple example: To show the decision tree as text in scikit-learn, you can use the export_text function from the sklearn.tree module. This function…