尽管文档中提到了xgboost,Python中的xgboost并没有恢复功能的重要性

时间:2019-02-15 09:43:35

标签: python xgboost

根据xgboost文档(https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.training),xgboost返回功能的重要性:

  

功能_重要性

     

功能重要性属性

     

注意

     

仅对树木助推器定义了特征重要性。仅在选择决策树模型作为基础学习者时定义特征重要性   (((booster = gbtree)。未为其他基础学习器类型定义,例如线性学习器(booster = gblinear)。

     

返回: feature_importances _

     

返回类型:形状为[n_features]的数组

但是,这似乎并没有发生,如以下玩具示例所示:

import seaborn as sns
import xgboost as xgb

mpg = sns.load_dataset('mpg')

toy = mpg[['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration']]

toy = toy.sample(frac=1)

N = toy.shape[0]

N1 = int(N/2)

toy_train = toy.iloc[:N1, :]
toy_test = toy.iloc[N1:, :]

toy_train_x = toy_train.iloc[:, 1:]

toy_train_y = toy_train.iloc[:, 1]

toy_test_x = toy_test.iloc[:, 1:]

toy_test_y = toy_test.iloc[:, 1]

max_depth = 6
eta = 0.3
subsample = 0.8
colsample_bytree = 0.7
alpha = 0.1

params = {"booster" : 'gbtree' , 'objective' : 'reg:linear' , 'max_depth' : max_depth, 'eta' : eta,\
             'subsample' : subsample, 'colsample_bytree' : colsample_bytree, 'alpha' : alpha}

dtrain_toy = xgb.DMatrix(data = toy_train_x , label = toy_train_y)
dtest_toy = xgb.DMatrix(data = toy_test_x, label = toy_test_y)
watchlist = [(dtest_toy, 'eval'), (dtrain_toy, 'train')]

xg_reg_toy = xgb.train(params = params, dtrain = dtrain_toy, num_boost_round = 1000, evals = watchlist, \
                early_stopping_rounds = 20)

xg_reg_toy.feature_importances_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-378-248f7887e307> in <module>()
----> 1 xg_reg_toy.feature_importances_

AttributeError: 'Booster' object has no attribute 'feature_importances_'

2 个答案:

答案 0 :(得分:0)

您使用的是Learning API,但您指的是Scikit-Learn API。而且只有Scikit-Learn API具有属性feature_importances

答案 1 :(得分:0)

由于明显的原因,对于像我这样不使用Scikit-Learn API的人。 通过here,我了解了该功能的重要性:

clf.get_score()

此外,我正在寻找一种更直观的表示法here

from xgboost import plot_importance
plot_importance(clf, max_num_features=10)

这会按照重要性的顺序生成带有指定(可选)max_num_features的条形图。