预测语句中的XGboost功能不匹配错误

时间:2018-07-04 06:38:32

标签: python-3.x xgboost

我试图通过将训练分为“训练”和“测试”来训练XGboost模型。在测试集中,我删除了结果列,并将其传递给predict()函数。这是我的代码的样子

    #splitting training set to test and train set

    x_train ,x_test = train_test_split(TrainingData,test_size=0.3) 
    result_x_test = x_test.Response
    x_test = x_test.drop('Response', 1)

    #XGboost algorithm to build a model/ predictor
    #Converting data to Matrix format

    XTrain = x_train.as_matrix()
    YTrain = x_train["Response"].as_matrix()
    XTest = x_test.as_matrix()

    #Training the model

     TrainingModel = xgb.XGBClassifier(max_depth=3, n_estimators=300, learning_rate=0.05).fit(XTrain, YTrain)
     Prediction = TrainingModel.predict(XTest) 

我得到的错误如下

 ValueError: feature_names mismatch: 

['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34', 'f35', 'f36', 'f37', 'f38', 'f39', 'f40', 'f41', 'f42', 'f43', 'f44', 'f45', 'f46', 'f47', 'f48', 'f49', 'f50', 'f51', 'f52', 'f53', 'f54', 'f55', 'f56', 'f57', 'f58', 'f59', 'f60', 'f61', 'f62', 'f63', 'f64', 'f65', 'f66', 'f67', 'f68', 'f69', 'f70', 'f71', 'f72', 'f73', 'f74', 'f75', 'f76', 'f77', 'f78', 'f79', 'f80', 'f81', 'f82', 'f83', 'f84', 'f85', 'f86', 'f87', 'f88', 'f89', 'f90', 'f91', 'f92', 'f93', 'f94', 'f95', 'f96', 'f97', 'f98', 'f99', 'f100', 'f101', 'f102', 'f103', 'f104', 'f105'] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34', 'f35', 'f36', 'f37', 'f38', 'f39', 'f40', 'f41', 'f42', 'f43', 'f44', 'f45', 'f46', 'f47', 'f48', 'f49', 'f50', 'f51', 'f52', 'f53', 'f54', 'f55', 'f56', 'f57', 'f58', 'f59', 'f60', 'f61', 'f62', 'f63', 'f64', 'f65', 'f66', 'f67', 'f68', 'f69', 'f70', 'f71', 'f72', 'f73', 'f74', 'f75', 'f76', 'f77', 'f78', 'f79', 'f80', 'f81', 'f82', 'f83', 'f84', 'f85', 'f86', 'f87', 'f88', 'f89', 'f90', 'f91', 'f92', 'f93', 'f94', 'f95', 'f96', 'f97', 'f98', 'f99', 'f100', 'f101', 'f102', 'f103', 'f104']
    expected f105 in input data

第105列是结果列,我认为由于缺少结果列而引发错误。但是测试集一定不能包含结果集吧? 解决此问题的方法是什么?

1 个答案:

答案 0 :(得分:0)

看来,您似乎仍将目标变量保留在XTrain中,我想这是您不希望的。并且'Response中缺少目标(XTest)功能。

因此,快速的解决方案是 XTrain = x_train.drop("Response", axis=1).as_matrix()

我个人也建议不要使用.as_matrix():sklear API可以很好地处理pd.DataFrame,这样可以保留有意义的功能名称,例如,可以简化此问题的调试