我正在LightGBM中使用LGBMClassifer构建二进制分类器模型,如下所示:
# LightGBM model
clf = LGBMClassifier(
nthread=4,
n_estimators=10000,
learning_rate=0.005,
num_leaves= 45,
colsample_bytree= 0.8,
subsample= 0.4,
subsample_freq=1,
max_depth= 20,
reg_alpha= 0.5,
reg_lambda=0.5,
min_split_gain=0.04,
min_child_weight=.05
random_state=0,
silent=-1,
verbose=-1)
下一步,将我的模型拟合训练数据
clf.fit(train_x, train_y, eval_set=[(train_x, train_y), (valid_x, valid_y)],
eval_metric= 'auc', verbose= 100, early_stopping_rounds= 200)
fold_importance_df = pd.DataFrame()
fold_importance_df["feature"] = feats
fold_importance_df["importance"] = clf.feature_importances_
输出:
feature importance
feature13 1108
feature21 1104
feature11 774
到目前为止,一切都很好,现在我正在研究基于此模型的特征重要性度量。因此,我正在使用feature_importance_()
函数来实现这一点(但默认情况下,它基于split
赋予了我功能重要性)
尽管split
使我了解了分割使用了多少个特征,但是我认为gain
使我对特征的重要性有了更好的了解。
LightGBM增强器类https://lightgbm.readthedocs.io/en/latest/Python-API.html?highlight=importance的Python API提到:
feature_importance(importance_type='split', iteration=-1)
Parameters:importance_type (string, optional (default="split")) –
If “split”, result contains numbers
of times the feature is used in a model. If “gain”, result contains
total gains of splits which use the feature.
Returns: result – Array with feature importances.
Return type: numpy array`
然而,用于LightGBM LGBMClassifier()
的 Sklearn API 没有提及任何内容Sklearn API LGBM,它仅对此功能具有此参数:
feature_importances_
array of shape = [n_features] – The feature importances (the higher, the more important the feature).
sklearn
版本(即基于LGBMClassifier()
的{{1}})获得功能重要性?答案 0 :(得分:1)
feature_importance()
是原始LGBM中Booster对象的一种方法。
sklearn API通过API Docs中给出的属性booster_
在训练数据上公开基础Booster。
因此,您可以首先访问该增强对象,然后以与原始LGBM相同的方式调用feature_importance()
。
clf.booster_.feature_importance(importance_type='gain')