In scikit-learn, the feature_importances_
attribute is associated with tree-based models, such as Decision Trees, Random Forests, and Gradient Boosted Trees. This attribute provides a way to assess the importance of each feature (or variable) in making predictions with the trained model.
When you train a tree-based model, the algorithm makes decisions at each node based on the values of specific features. The feature_importances_
attribute represents the relative importance of each feature in making these decisions.
Here’s a simple example using a Decision Tree Classifier:
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
# Create a decision tree classifier
clf = DecisionTreeClassifier()
# Assuming X_train and y_train are your training data
clf.fit(X_train, y_train)
# Access feature importances
feature_importances = clf.feature_importances_
# Print or visualize feature importances
print (pd.DataFrame(feature_importances, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
#or
for feature, importance in zip(X_train.columns, feature_importances):
print(f"{feature}: {importance}")
Imp
amount 0.204163
checking_balance 0.136840
age 0.110746
months_loan_duration 0.100323
employment_duration 0.073225
credit_history 0.065357
savings_balance 0.057059
years_at_residence 0.052719
percent_of_income 0.034128
purpose_business 0.023784
dependents 0.023062
The bar plot visualizes the relative importance of each feature.
# Plotting feature importances
feature_names = list(X.columns)
importances = clf.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()