我正在研究ML项目,它是一个二进制分类,并且正在培训XGBoost。
我想在交叉验证后绘制要素的重要性,但是我没有做到。这是我当前的代码:
# transform X, Y, X_test, to array
X_cross = np.array(X_train)
Y_cross = np.array(Y_train)
test = np.array(X_test)
id_test = X_test.index.values
sub = pd.DataFrame()
sub['id'] = id_test
sub['target'] = np.zeros_like(id_test)
list_names = list(X_train.columns)
# for each K, a new section of index will be used for the separation of train set and test set
for i, (train_index, test_index) in enumerate(skf.split(X_cross, Y_cross)): print('[Fold %d/%d]' % (i + 1, kfold))
# Split data with the the index computed with the function of cross_val
X_train_kfd, X_valid=X_cross[train_index],X_cross[test_index]
y_train_kfd, y_valid = Y_cross[train_index], Y_cross[test_index]
#Convert our data into XGBoost format
d_train = xgboost.DMatrix(X_train_kfd,label=y_train_kfd, feature_names=list_names)
d_valid = xgboost.DMatrix(X_valid,label=y_valid, feature_names=list_names)
d_test = xgboost.DMatrix(X_test.values)
watchlist = [(d_train, 'train'), (d_valid, 'valid')]
# Train the model. We pass in a max of 2500 rounds (with early stopping after 60)
mdl = xgboost.train(gbm_params,d_train,2500,evals=watchlist,early_stopping_rounds=60,verbose_eval =10)
print('[Fold %d/%d Prediction:]' % (i + 1, kfold))
# Predict on our test data
p_test = mdl.predict(d_test)
sub['target'] += p_test/kfold
我试图在DMatrix中传递要素名称(因为我使用了numpy数组),但是它似乎不起作用。
我得到的错误是:
功能名称不匹配:['x1','x2','x3'...] ['f0','f1','f2','f3'...] 输入数据训练数据中预期的x1,x2,x3 ...没有以下字段:f1,f2,f3 ...
有人可以帮助我吗?
非常感谢。