我尝试了许多回归算法,例如Gradient Boosting,随机森林,决策树。 我尝试使用minmax和标准缩放器缩放。
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train=scaler.fit_transform(x_train)
x_test=scaler.transform(x_test)
to_predict=scaler.transform(to_predict)
缩放后X_train的一些值:
[-5.80641310e-01, -5.15194242e-01, -4.57304981e-01],
[-5.99982344e-01, -5.74535988e-01, -4.57304981e-01],
[-5.99982344e-01, -5.74535988e-01, -4.57304981e-01],
[ 5.08258952e-01, 1.02769113e+00, 9.71784878e-03],
[-5.99982344e-01, -5.74535988e-01, -4.57304981e-01],
[-5.84509517e-01, -5.74535988e-01, -4.57304981e-01],
[-5.98048241e-01, -5.74535988e-01, -4.57304981e-01]
缩放后X_test的一些值:
[-0.59998234, -0.57453599, -0.45730498],
[ 0.97244379, 2.15518429, -0.02141701],
[ 2.50812195, 2.74860174, 3.18547309],
[ 0.33612374, 0.37493194, -0.05255186],
[-0.43364944, -0.51519424, -0.43862407],
[ 2.37273471, 2.57057651, 3.92025568]
缩放后需要预测的数据中的某些值
[ 10.46308958, 7.37725787, 18.92725594],
[ 11.04912294, 8.2080423 , 19.03934142],
[ 12.04131803, 7.85199183, 19.96716011],
[ 12.29468558, 7.85199183, 15.58337248],
[ 13.15342753, 8.68277626, 19.99829496],
[ 11.8053574 , 8.32672579, 18.29833186],
[ 10.82476694, 9.69158593, 21.86638628]
梯度提升回归:
grad=GradientBoostingRegressor(n_estimators=500,random_state=100,learning_rate=1,max_depth = 10,min_samples_leaf =3,min_samples_split = 12)
grad.fit(x_train,y_train)
mae test: 0.03380270193992188
mse test: 0.0025669439864247356
rmse test: 0.05066501738304977
r2 test: 0.80834162616979
mae train: 0.02458157407407408
mse train: 0.0025056432439638046
rmse train: 0.05005640062932816
r2 train: 0.815567744395517
一些测试集预测:-0.09690972,-0.09690972,-0.09690972、0.14249752
某些火车组预测:-0.11616,0.068165,0.048538,-0.09690972
对于要预测的数据的一些预测:0.124851,0.124851,0.124851, 0.124851
训练后的模型对于训练集和测试集效果很好,但是对我需要预测的行预测相同的常数。可能是因为缩放后的训练值和测试值具有相同的顺序,但是我要使用该模型的数据具有很高的值。我不知道如何解决这个问题。
如果我更改调整参数,则预测将仅转换为其他常数。对于我尝试过的所有回归算法,都会发生这种情况。我该如何解决这个问题?