背景:我正在尝试在n天之内预测1个功能。因此,我基于Prediction
列创建了CCI
列。然后,我分别创建X和Y集,并相应地划分时间序列训练/测试集。
由于我有1个特征,并且预计n天后有1个特征的值既为正也为负,因此我在训练集和测试集上都使用了sklearn中的1 PowerTransformer。模型设置:SVR(kernel='rbf', degree=3)
问题:在打印出y_pred之后,我看到几乎所有值都是正数。但是从数据集来看,很明显该功能既包含负值也包含正值。稍后计算的回归指标(MAE,MSE,MAPE,RMSE)也很高。我在这里缺少核心概念吗?我很沮丧。
其他步骤:我也尝试过不做任何变换就可以直接拟合和预测。预测结果相同,回归指标得分相近。
df = pd.read_csv('data.csv', index_col='Date')
# Create another column using the prediction field by shifting 'n' units up
df['Prediction'] = df[['CCI']].shift(-n)
# X holds all values (features) besides prediction and shift back to predict
# by dropping Prediction column
X = np.array(df.drop(['Prediction'],1))
X = X[:-n]
# y holds all predictions and shift back to predict
y = np.array(df['Prediction'])
y = y[:-n]
# Generate the train/test sets
tscv = TimeSeriesSplit(n_splits=5)
# Cross validation is to check robustness over different sets of data
for train_index, test_index in tscv.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# yeo-johnson: works with positive and negative values
pt = PowerTransformer()
X_train = pt.fit_transform(X_train)
X_test = pt.transform(X_test)
y_train = pt.transform(y_train.reshape(-1,1))
y_test = pt.transform(y_test.reshape(-1,1))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# I inverse to regain original sets to compute errors later on
y_pred = pt.inverse_transform(y_pred.reshape(-1,1))
y_test = pt.inverse_transform(y_test.reshape(-1,1))
print(y_pred)
# regression metrics calculations...
data.csv的片段:
Date CCI
2009-01-02 142.1676
2009-01-05 162.1708
2009-01-06 156.5971
2009-01-07 61.9481
2009-01-08 44.5143
2009-01-09 -25.6343
2009-01-12 -63.5495
2009-01-13 -73.8075
2009-01-14 -148.0438
2009-01-15 -147.5939
2009-01-16 -108.7564
2009-01-20 -165.0671
2009-01-21 -121.8914
2009-01-22 -115.2156
y_pred的代码段:
[[ 75.21390506]
[ 83.11978989]
[ 69.72460391]
[ 65.99153699]
[ 66.59187584]
[ 65.25105997]
[ 66.19096343]
[ 67.52395369]
[ 69.77453251]
[ 76.7436674 ]
[ 70.03372514]
[ 70.54682375]
[ 78.09269801]
[ 91.70247194]
[ 72.10245259]
...]
指标:
MAE score for each iteration: 82.11919587923673
MSE score for each iteration: 10960.090501194143
MAPE score for each iteration: 232.84026394293588
RMSE score for each iteration: 104.69045085963735