SVR预测相同(所有正整数)-sklearn

时间:2019-12-10 02:04:12

标签: python machine-learning scikit-learn

背景:我正在尝试在n天之内预测1个功能。因此,我基于Prediction列创建了CCI列。然后,我分别创建X和Y集,并相应地划分时间序列训练/测试集。

由于我有1个特征,并且预计n天后有1个特征的值既为正也为负,因此我在训练集和测试集上都使用了sklearn中的1 PowerTransformer。模型设置:SVR(kernel='rbf', degree=3)

问题:在打印出y_pred之后,我看到几乎所有值都是正数。但是从数据集来看,很明显该功能既包含负值也包含正值。稍后计算的回归指标(MAE,MSE,MAPE,RMSE)也很高。我在这里缺少核心概念吗?我很沮丧。

其他步骤:我也尝试过不做任何变换就可以直接拟合和预测。预测结果相同,回归指标得分相近。

  df = pd.read_csv('data.csv', index_col='Date')

  # Create another column using the prediction field by shifting 'n' units up
  df['Prediction'] = df[['CCI']].shift(-n)

  # X holds all values (features) besides prediction and shift back to predict
  # by dropping Prediction column
  X = np.array(df.drop(['Prediction'],1))
  X = X[:-n]

  # y holds all predictions and shift back to predict
  y = np.array(df['Prediction'])
  y = y[:-n]
  # Generate the train/test sets
  tscv = TimeSeriesSplit(n_splits=5)

  # Cross validation is to check robustness over different sets of data
  for train_index, test_index in tscv.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # yeo-johnson:  works with positive and negative values
    pt = PowerTransformer()
    X_train = pt.fit_transform(X_train)
    X_test = pt.transform(X_test)

    y_train = pt.transform(y_train.reshape(-1,1))
    y_test = pt.transform(y_test.reshape(-1,1))
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    # I inverse to regain original sets to compute errors later on
    y_pred = pt.inverse_transform(y_pred.reshape(-1,1))
    y_test = pt.inverse_transform(y_test.reshape(-1,1))

    print(y_pred)
    # regression metrics calculations...

data.csv的片段

Date        CCI        
2009-01-02  142.1676
2009-01-05  162.1708
2009-01-06  156.5971
2009-01-07   61.9481
2009-01-08   44.5143
2009-01-09  -25.6343
2009-01-12  -63.5495
2009-01-13  -73.8075
2009-01-14 -148.0438
2009-01-15 -147.5939
2009-01-16 -108.7564
2009-01-20 -165.0671
2009-01-21 -121.8914
2009-01-22 -115.2156

y_pred的代码段

[[  75.21390506]
 [  83.11978989]
 [  69.72460391]
 [  65.99153699]
 [  66.59187584]
 [  65.25105997]
 [  66.19096343]
 [  67.52395369]
 [  69.77453251]
 [  76.7436674 ]
 [  70.03372514]
 [  70.54682375]
 [  78.09269801]
 [  91.70247194]
 [  72.10245259]
 ...]

指标

MAE score for each iteration:  82.11919587923673
MSE score for each iteration:  10960.090501194143
MAPE score for each iteration:  232.84026394293588
RMSE score for each iteration:  104.69045085963735

0 个答案:

没有答案