我试图通过将训练分为“训练”和“测试”来训练XGboost模型。在测试集中,我删除了结果列,并将其传递给predict()函数。这是我的代码的样子
#splitting training set to test and train set
x_train ,x_test = train_test_split(TrainingData,test_size=0.3)
result_x_test = x_test.Response
x_test = x_test.drop('Response', 1)
#XGboost algorithm to build a model/ predictor
#Converting data to Matrix format
XTrain = x_train.as_matrix()
YTrain = x_train["Response"].as_matrix()
XTest = x_test.as_matrix()
#Training the model
TrainingModel = xgb.XGBClassifier(max_depth=3, n_estimators=300, learning_rate=0.05).fit(XTrain, YTrain)
Prediction = TrainingModel.predict(XTest)
我得到的错误如下
ValueError: feature_names mismatch:
['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34', 'f35', 'f36', 'f37', 'f38', 'f39', 'f40', 'f41', 'f42', 'f43', 'f44', 'f45', 'f46', 'f47', 'f48', 'f49', 'f50', 'f51', 'f52', 'f53', 'f54', 'f55', 'f56', 'f57', 'f58', 'f59', 'f60', 'f61', 'f62', 'f63', 'f64', 'f65', 'f66', 'f67', 'f68', 'f69', 'f70', 'f71', 'f72', 'f73', 'f74', 'f75', 'f76', 'f77', 'f78', 'f79', 'f80', 'f81', 'f82', 'f83', 'f84', 'f85', 'f86', 'f87', 'f88', 'f89', 'f90', 'f91', 'f92', 'f93', 'f94', 'f95', 'f96', 'f97', 'f98', 'f99', 'f100', 'f101', 'f102', 'f103', 'f104', 'f105'] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34', 'f35', 'f36', 'f37', 'f38', 'f39', 'f40', 'f41', 'f42', 'f43', 'f44', 'f45', 'f46', 'f47', 'f48', 'f49', 'f50', 'f51', 'f52', 'f53', 'f54', 'f55', 'f56', 'f57', 'f58', 'f59', 'f60', 'f61', 'f62', 'f63', 'f64', 'f65', 'f66', 'f67', 'f68', 'f69', 'f70', 'f71', 'f72', 'f73', 'f74', 'f75', 'f76', 'f77', 'f78', 'f79', 'f80', 'f81', 'f82', 'f83', 'f84', 'f85', 'f86', 'f87', 'f88', 'f89', 'f90', 'f91', 'f92', 'f93', 'f94', 'f95', 'f96', 'f97', 'f98', 'f99', 'f100', 'f101', 'f102', 'f103', 'f104']
expected f105 in input data
第105列是结果列,我认为由于缺少结果列而引发错误。但是测试集一定不能包含结果集吧? 解决此问题的方法是什么?
答案 0 :(得分:0)
看来,您似乎仍将目标变量保留在XTrain
中,我想这是您不希望的。并且'Response
中缺少目标(XTest
)功能。
因此,快速的解决方案是
XTrain = x_train.drop("Response", axis=1).as_matrix()
我个人也建议不要使用.as_matrix()
:sklear API可以很好地处理pd.DataFrame
,这样可以保留有意义的功能名称,例如,可以简化此问题的调试