RandomizedSearchCV vs GridSearchCV – Beyond Knowledge Innovation

RandomizedSearchCV is a method provided by scikit-learn for hyperparameter tuning and model selection through cross-validation. It’s similar to GridSearchCV, but instead of exhaustively searching through all possible combinations of hyperparameters, it randomly samples a fixed number of hyperparameter settings from specified distributions.

Here’s a basic overview of how RandomizedSearchCV works:

Define a parameter grid or a distribution for each hyperparameter you want to tune.
Specify the number of iterations (random samples) you want to perform.
Pass the estimator (model), parameter grid/distributions, and number of iterations to RandomizedSearchCV.
RandomizedSearchCV performs cross-validation for each random combination of hyperparameters and selects the best combination based on the scoring metric.
After the search is complete, you can access attributes like best_params_, best_score_, and best_estimator_ to retrieve information about the best-performing model.

Here’s a basic example of how to use RandomizedSearchCV:

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint

# Define parameter distributions
param_dist = {
    'n_estimators': randint(10, 100),
    'max_depth': randint(1, 10),
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 20),
    'max_features': ['auto', 'sqrt', 'log2']
}

# Create a RandomForestClassifier instance
clf = RandomForestClassifier()

# Create RandomizedSearchCV instance
random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=10, cv=5)

# Fit the model
random_search.fit(X_train, y_train)

# Get the best parameters
best_params = random_search.best_params_

In this example, RandomizedSearchCV is used to search for the best hyperparameters for a RandomForestClassifier by randomly sampling from the specified parameter distributions. The n_iter parameter controls the number of random combinations to try, and cv specifies the number of folds for cross-validation.

Here is another example:

%%time

# Choose the type of classifier. 
rf2 = RandomForestClassifier(random_state=1)

# Grid of parameters to choose from
parameters = {"n_estimators": [150,200,250],
    "min_samples_leaf": np.arange(5, 10),
    "max_features": np.arange(0.2, 0.7, 0.1), 
    "max_samples": np.arange(0.3, 0.7, 0.1),
    "max_depth":np.arange(3,4,5),
    "class_weight" : ['balanced', 'balanced_subsample'],
    "min_impurity_decrease":[0.001, 0.002, 0.003]
             }

# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)

# Run the random search
grid_obj = RandomizedSearchCV(rf2, parameters,n_iter=30, scoring=acc_scorer,cv=5, random_state = 1, n_jobs = -1, verbose = 2)
# using n_iter = 30, so randomized search will try 30 different combinations of hyperparameters
# by default, n_iter = 10

grid_obj = grid_obj.fit(X_train, y_train)

# Print the best combination of parameters
grid_obj.best_params_

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28

You Might Also Like