在遵循Kaggle的文章的同时进行时间序列预测(向量输出模型,多步预测)问题

时间:2020-10-18 14:07:46

标签: python deep-learning time-series lstm forecasting

我正在关注这篇精彩的文章,并学习销售数据分析和预测。

https://www.kaggle.com/milanzdravkovic/pharma-sales-data-analysis-and-forecasting/notebook/#4.3.4.1.-Long-term-forecasting-with-Vanilla-LSTM-configuration

我在4.3.4.1部分。使用Vanilla LSTM配置进行长期预测。 (请使用上面的链接并滚动到4.3.4.1。对不起,我试图创建指向该确切部分的锚点链接,但失败了)

这里的预测只是对未来的一步,而我正在尝试对未来进行4步预测,这意味着预测4周的销售量。我已经编辑了原始代码,并将“ n_steps_in,n_steps_out”添加到了split_sequence函数中。

from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler


def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):

        # find the end of this pattern
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out

        # check if we are beyond the sequence
        if out_end_ix > len(sequence):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)
 

size = int(len(df) - 50)
n_steps_in = 5
n_steps_out = 4
n_features = 1

并如下修改原始代码,

df=pd.read_csv('data/salesweekly.csv')

subplotindex=0
numrows=4
numcols=2
fig, ax = plt.subplots(numrows, numcols, figsize=(18,15))
plt.subplots_adjust(wspace=0.1, hspace=0.3)

warnings.filterwarnings("ignore")

r=['M01AB','M01AE','N02BA','N02BE','N05B','N05C','R03','R06']
for x in r:
    rowindex=math.floor(subplotindex/numcols)

    colindex=subplotindex-(rowindex*numcols)

    X=df[x].values
    scaler = MinMaxScaler(feature_range = (0, 1))

    X=scaler.fit_transform(X.reshape(-1, 1))
    
    # split into samples
    X_train,y_train=split_sequence(X[0:size], n_steps_in, n_steps_out)
    X_test,y_test=split_sequence(X[size:len(df)], n_steps_in, n_steps_out)

    # reshape from [samples, timesteps] into [samples, timesteps, features]
    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], n_features))

    # define model
    model = Sequential()
    model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(100, activation='relu'))
    model.add(Dense(n_steps_out))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X_train, y_train, epochs=400, verbose=0)
    X_test = X_test.reshape((len(X_test), n_steps_in, n_features))
    predictions = model.predict(X_test, verbose=0)
    y_test=scaler.inverse_transform(y_test)

    predictions = scaler.inverse_transform(predictions)
    error = mean_squared_error(y_test, predictions)
    perror = mean_absolute_percentage_error(y_test, predictions)
    resultsLongtermdf.loc['Vanilla LSTM MSE',x]=error
    resultsLongtermdf.loc['Vanilla LSTM MAPE',x]=perror
    ax[rowindex,colindex].set_title(x+' (MSE=' + str(round(error,2))+', MAPE='+ str(round(perror,2)) +'%)')
    ax[rowindex,colindex].legend(['Real', 'Predicted'], loc='upper left')
    ax[rowindex,colindex].plot(y_test)
    ax[rowindex,colindex].plot(predictions, color='red')
    subplotindex=subplotindex+1
plt.show()

我分别添加了“ n_steps_in”和“ n_steps_out”,并将模型安装到其中。

但是出现错误:

ValueError:找到的数组为暗3。估计值应为<= 2。

我被困在这里,并且我寻找的结果与原始文章相似,但是预测部分将在未来的4周内进行,而不是仅仅一个星期。

有人可以帮忙吗? 非常感谢。

0 个答案:

没有答案