Skip to content
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS
Close
Beyond Knowledge Innovation

Beyond Knowledge Innovation

Where Data Unveils Possibilities

  • Home
  • AI & ML Insights
  • Machine Learning
    • Supervised Learning
      • Introduction
      • Regression
      • Classification
    • Unsupervised Learning
      • Introduction
      • Clustering
      • Association
      • Dimensionality Reduction
    • Reinforcement Learning
    • Generative AI
  • Knowledge Base
    • Introduction To Python
    • Introduction To Data
    • Introduction to EDA
  • References
HomeImplementationSupervised LearningClassificationPost-pruning Decision Tree with Cost Complexity Parameter ccp_alpha
Classification

Post-pruning Decision Tree with Cost Complexity Parameter ccp_alpha

March 8, 2024March 8, 2024CEO 200 views

clf = DecisionTreeClassifier(random_state=1)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities

clfs = []
for ccp_alpha in ccp_alphas:
    clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
    clf.fit(X_train, y_train)
    clfs.append(clf)

print("Number of nodes in the last tree is: {} with ccp_alpha: {}".format(clfs[-1].tree_.node_count, ccp_alphas[-1]))

# we remove the last element in clfs and ccp_alphas, because it is the trivial tree with only one node
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]


train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]

index_best_model = np.argmax(test_scores)
best_model = clfs[index_best_model]
print(best_model)
print('Training accuracy of best model: ',best_model.score(X_train, y_train))
print('Test accuracy of best model: ',best_model.score(X_test, y_test))

#Since accuracy isn't the right metric for our data we would want high recall
recall_train=[]
for clf in clfs:
    pred_train3=clf.predict(X_train)
    values_train=metrics.recall_score(y_train,pred_train3)
    recall_train.append(values_train)

recall_test=[]
for clf in clfs:
    pred_test3=clf.predict(X_test)
    values_test=metrics.recall_score(y_test,pred_test3)
    recall_test.append(values_test)

# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)

Now put the printed best_model here below and run:

clf = DecisionTreeClassifier(paste here...)
clf.fit(X_train, y_train)

make_confusion_matrix(estimator, X_test, y_test)
get_accuracy_and_recall_score(estimator, X_train, X_test, y_train, y_test)
get_feature_importances_and_visualize(estimator, X_test)
ccp_alpha, cost complexity matrix, decision tree, post pruning

Post navigation

Previous Post
Previous post: Pre-pruning Decision Tree – GridSearch for Hyperparameter tuning
Next Post
Next post: NumPy function argsort

You Might Also Like

No image
BaggingClassifier from Scikit-Learn
April 7, 2024 Comments Off on BaggingClassifier from Scikit-Learn
No image
Pre-pruning Decision Tree – GridSearch for Hyperparameter…
March 8, 2024 Comments Off on Pre-pruning Decision Tree – GridSearch for Hyperparameter tuning
No image
Pre-pruning Decision Tree – depth restricted
March 8, 2024 Comments Off on Pre-pruning Decision Tree – depth restricted
No image
Feature Importance in Decision Tree
March 7, 2024 Comments Off on Feature Importance in Decision Tree
No image
Visualizing the Decision Tree
March 7, 2024 Comments Off on Visualizing the Decision Tree
  • Recent
  • Popular
  • Random
  • No image
    7 months ago Low-Rank Factorization
  • No image
    7 months ago Perturbation Test for a Regression Model
  • No image
    7 months ago Calibration Curve for Classification Models
  • No image
    March 15, 20240Single linkage hierarchical clustering
  • No image
    April 17, 20240XGBoost (eXtreme Gradient Boosting)
  • No image
    April 17, 20240Gradient Boosting
  • No image
    May 5, 2024MNIST dataset in artificial neural network
  • No image
    May 5, 2024Perceptron in artificial neural network
  • No image
    October 21, 2024Calibration Curve for Classification Models
  • Implementation (55)
    • EDA (4)
    • Neural Networks (10)
    • Supervised Learning (26)
      • Classification (17)
      • Linear Regression (8)
    • Unsupervised Learning (11)
      • Clustering (8)
      • Dimensionality Reduction (3)
  • Knowledge Base (44)
    • Python (27)
    • Statistics (6)
May 2025
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
« Oct    

We are on

FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS

Subscribe

© 2025 Beyond Knowledge Innovation
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS