我正在尝试为随机森林模型绘制要素重要性,并将每个要素重要性映射回原始系数。我设法创建了一个显示重要性的图,并使用原始变量名作为标签,但现在它按照它们在数据集中的顺序排序变量名(而不是按重要性顺序排序)。如何按功能重要性排序?谢谢!
我的代码是:
importances = brf.feature_importances_
std = np.std([tree.feature_importances_ for tree in brf.estimators_],
axis=0)
indices = np.argsort(importances)[::-1]
# Print the feature ranking
print("Feature ranking:")
for f in range(x_dummies.shape[1]):
print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))
# Plot the feature importances of the forest
plt.figure(figsize=(8,8))
plt.title("Feature importances")
plt.bar(range(x_train.shape[1]), importances[indices],
color="r", yerr=std[indices], align="center")
feature_names = x_dummies.columns
plt.xticks(range(x_dummies.shape[1]), feature_names)
plt.xticks(rotation=90)
plt.xlim([-1, x_dummies.shape[1]])
plt.show()
答案 0 :(得分:15)
一种通用的解决方案是将特征/重要性抛出到数据框中并在绘图之前对它们进行排序:
import pandas as pd
%matplotlib inline
#do code to support model
#"data" is the X dataframe and model is the SKlearn object
feats = {} # a dict to hold feature_name: feature_importance
for feature, importance in zip(data.columns, model.feature_importances_):
feats[feature] = importance #add the name/value pair
importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance').plot(kind='bar', rot=45)
答案 1 :(得分:5)
我对Sam使用类似的解决方案:
import pandas as pd
important_features = pd.Series(data=brf.feature_importances_,index=x_dummies.columns)
important_features.sort_values(ascending=False,inplace=True)
我总是只使用print important_features
打印列表,但为了绘图,您可以随时使用Series.plot
答案 2 :(得分:2)
另一种获取排序列表的简单方法
importances = list(zip(xgb_classifier.feature_importances_, df.columns))
importances.sort(reverse=True)
如果必要,下一代码会添加可视化
pd.DataFrame(importances, index=[x for (_,x) in importances]).plot(kind = 'bar')
答案 3 :(得分:1)
很简单,我是这样绘制的。
feat_importances = pd.Series(extraTree.feature_importances_, index=X.columns)
feat_importances.nlargest(15).plot(kind='barh')
plt.title("Top 15 important features")
plt.show()