Question

我已经训练了XGBoost模型，并使用plot_importance（）绘制了哪些特征在训练后的模型中最重要。虽然，图中的数字有几个十进制值，使整个图充满并且不适合该图。

我已经搜索了绘图格式化选项，但是我只找到了如何格式化轴（尝试格式化X轴，希望它也能格式化相应的轴）

我在Jupyter Noteboook中工作（如果有区别的话）。代码如下：

xg_reg = xgb.XGBClassifier(
                objective = 'binary:logistic',
                colsample_bytree = 0.4,
                learning_rate = 0.01,
                max_depth = 15, 
                alpha = 0.1, 
                n_estimators = 5,
                subsample = 0.5,
                scale_pos_weight = 4
                )
xg_reg.fit(X_train, y_train) 
preds = xg_reg.predict(X_test)

ax = xgb.plot_importance(xg_reg, max_num_features=3, importance_type='gain', show_values=True) 

fig = ax.figure
fig.set_size_inches(10, 3)

有什么我想念的吗？是否有任何格式化功能或参数要传递？

我希望能够格式化功能重要性得分，或者至少删除小数部分（例如“ 25”而不是“ 25.66521”）。在下面附加了当前图。

xgboost_feature_importance_scores

Answer 1

无需编辑xgboost绘图功能即可获得所需的结果。绘图功能可以将重要性字典作为第一个参数，您可以直接从xgboost模型中创建该字典，然后进行编辑。如果您想为功能名称添加更友好的标签，这也很方便。

# Get the booster from the xgbmodel
booster = xg_reg.get_booster()

# Get the importance dictionary (by gain) from the booster
importance = booster.get_score(importance_type="gain")

# make your changes
for key in importance.keys():
    importance[key] = round(importance[key],2)

# provide the importance dictionary to the plotting function
ax = plot_importance(importance, max_num_features=3, importance_type='gain', show_values=True)

Answer 2

使用以下命令在xgboost软件包中编辑plotting.py的代码：

86 ylocs = np.arange(len(values))
87 values=tuple([round(x,4) for x in values])
88 ax.barh(ylocs, values, align='center', height=height, **kwargs)

enter image description here

Answer 3

我在这里遇到了刚刚解决的同样麻烦。

之所以发生这种情况，仅是因为“增加”或“覆盖”这些数字包含太多与“权重”选项相反的浮动数字。不幸的是，据我所知，没有指定数字的选项。因此，我自己修改了功能，以指定允许的最大位数。这是要在xgboost软件包的 plotting.py 文件中执行的修改。如果使用的是Spider控制台，则只需指定错误的选项（我是个懒惰的人）即可找到并打开文件，例如：

xgb.plot_importance(xg_reg, potato=False)

然后从控制台中的“错误”中单击文件。下一步是修改函数本身，如下所示：

def plot_importance(booster, ax=None, height=0.2,
                    xlim=None, ylim=None, title='Feature importance',
                    xlabel='F score', ylabel='Features',
                    importance_type='weight', max_num_features=None,
                    grid=True, show_values=True, max_digits=3, **kwargs):

，然后您还应该在show_values条件之前添加：

if max_digits is not None:
    t = values
    lst = list(t)
    if len(str(lst[0]).split('.')[-1])>max_digits:
        values_displayed = tuple([('{:.'+str(max_digits)+'f}').format(x) for x in lst])
    else:
        values_displayed = values

if show_values is True:
    for x, x2, y in zip(values, values_displayed, ylocs):
        ax.text(x + 1, y, x2, va='center')

我执行了一个条件，即仅格式化后一个数字长于指定位数的数字。例如，它避免了重要性_type ='weight'选项产生多余的数字。

请注意，对于“ cover”和“ gain”而言，该文本对我来说也位置不佳，因此我也修改了班次并将上面的1替换为：

if show_values is True:
    for x, x2, y in zip(values, values_displayed, ylocs):
         dx = np.max(values)/100
         ax.text(x + dx, y, x2, va='center')

希望它对您有帮助！

XGBoost中的地块编号格式plot_importance（）

3 个答案: