我已经使用大型数据集[400000,93]对XGBClassifier进行了实验, 数据包含大量 NaN 值,我使用了 sklearn 包中的插补
imputer = Imputer()
imputed_x = imputer.fit_transform(data)
data = imputed_x
但功能重要值看起来像
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
出于这个原因,结果 -
precision: 1.0
recall: 1.0
accuracy: 1.0
traning_accuracy: 1.0
为什么模型不适合数据。
示例代码片段
model_xboost =XGBClassifier(max_depth=5,
n_estimators=100,)
#train
model_xboost.fit(train_data,train_labels)
print(model_xboost.feature_importances_)