In general, the deeper you allow your tree to grow, the more complex your model will become because you will have more splits and it captures more information about the data and this is one of the root causes of overfitting.
We can limit the tree with max_depth
of tree:
from google.colab import drive
drive.mount('/content/drive')
functions_path = '/content/drive/MyDrive/Lib/'
%run {functions_path}myfunctions.ipynb
clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=1)
clf.fit(X_train, y_train)
# review performance of the tree
make_confusion_matrix(clf, X_test, y_test)
get_accuracy_and_recall_score(clf, X_train, X_test, y_train, y_test)
get_feature_importances_and_visualize(clf, X_train)