我的预测结果应该具有正值和负值。我使用两个阶段预测。以下是步骤:
1) split the data into 3 sets(training, test and out of sample)
2) train different base regressors using training set data,
the regressors are different types of trees, such as gradient boosting tree.
3) use trained regressor to predict test set
4) use the predicted output from step 3 to train svm to come
up with the second level model.
5) use each regressor to predict out of sample data
6) use step 5 output value as input to step 4 fitted model to
predict final result on out of sample data.
在我的响应变量(预测)中,我应该同时具有正值和负值,但在步骤6中我只看到所有预测都是正值
1) Here is a sample of step 5, using only 3 base learners,
obviously they have both positive and negative values from different learners:
>>array([[ 6.72144956e-04, 1.56136199e-03, 1.58553265e-04],
[ -4.63248063e-04, 4.95401301e-04, 1.10566458e-04],
...
[ 1.48747688e-03, -1.11622013e-03, -7.57807887e-05]]
2) output of step 6, here all the values are positive, buy real value
obviously have both positive and negative.
>> array([ 4.56349996e-04, 4.43408819e-04, ...
4.36207927e-04])
以下是混合模型的python代码:
def fit(self):
sub_models = self.models
y_test_i = []
for i in range(len(sub_models)):
model = sub_models[i]
y_test_i.append( model.predict(self.x_test))
y_test_i = np.array(y_test_i).T
parameters = {
"kernel": "rbf",
"C": 0.001,
"gamma" : 0
}
self.blender = SVR(**parameters)
self.blender.fit( y_test_i, self.y_test.values)
def pred(self):
y_oos_i = []
for i in range(len(self.models)):
model = self.models[i]
y_oos_i.append( model.predict(self.x_oos))
y_oos_i = np.array(y_oos_i).T
y_pred = self.blender.predict(y_oos_i)
我的问题是如何调试这种情况?顺便说一下:每组有大约800个数据点,第二步输入大约有20个特征。
答案 0 :(得分:0)
可能你的训练数据在一个阶段都是非负的
要进行调试,我建议计算每个阶段的每个训练和测试集中的正数和负数 (像这样):
print "N nonnegatives: ", np.count_nonzero(Y_train >= 0)
print "N negatives: ", np.count_nonzero(Y_train < 0)
特别是,似乎第4步训练或测试数据可能是可疑的。