在特征重要性XGBoost中绘制特征名称

时间:2019-06-03 11:25:59

标签: machine-learning xgboost feature-selection

我正在研究ML项目,它是一个二进制分类,并且正在培训XGBoost。

我想在交叉验证后绘制要素的重要性,但是我没有做到。这是我当前的代码:

# transform X, Y, X_test, to array
X_cross = np.array(X_train)
Y_cross = np.array(Y_train)
test = np.array(X_test)
id_test = X_test.index.values
sub = pd.DataFrame()
sub['id'] = id_test
sub['target'] = np.zeros_like(id_test)
list_names = list(X_train.columns)
# for each K, a new section of index will be used for the separation of train set and test set
for i, (train_index, test_index) in enumerate(skf.split(X_cross, Y_cross)): print('[Fold %d/%d]' % (i + 1, kfold)) 
# Split data with the the index computed with the function of cross_val
X_train_kfd, X_valid=X_cross[train_index],X_cross[test_index]
y_train_kfd, y_valid = Y_cross[train_index], Y_cross[test_index] 
#Convert our data into XGBoost format 
d_train = xgboost.DMatrix(X_train_kfd,label=y_train_kfd, feature_names=list_names) 
d_valid = xgboost.DMatrix(X_valid,label=y_valid, feature_names=list_names) 
d_test  = xgboost.DMatrix(X_test.values) 
watchlist = [(d_train, 'train'), (d_valid, 'valid')] 
# Train the model. We pass in a max of 2500 rounds (with early stopping after 60) 
mdl = xgboost.train(gbm_params,d_train,2500,evals=watchlist,early_stopping_rounds=60,verbose_eval =10) 
print('[Fold %d/%d Prediction:]' % (i + 1, kfold)) 
# Predict on our test data 
p_test = mdl.predict(d_test) 
sub['target'] += p_test/kfold

我试图在DMatrix中传递要素名称(因为我使用了numpy数组),但是它似乎不起作用。

我得到的错误是:

功能名称不匹配:['x1','x2','x3'...] ['f0','f1','f2','f3'...] 输入数据训练数据中预期的x1,x2,x3 ...没有以下字段:f1,f2,f3 ...

有人可以帮助我吗?

非常感谢。

0 个答案:

没有答案