我正在学习使用应用于时间序列的神经网络,因此我调优了一个LSTM示例,发现该示例可以预测每日温度数据。但是,我发现结果非常差,如图所示。 (为了节省时间,我仅预测最近的92天)。
这是我实现的代码。数据是3列数据框(最低,最高和平均每日温度),但我一次只使用其中一列。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tools.eval_measures import rmse
from sklearn.preprocessing import MinMaxScaler
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")
input_file2 = "TemperaturasCampillos.txt"
seriesT = pd.read_csv(input_file2,sep = "\t", decimal = ".", names = ["Minimas","Maximas","Medias"])
seriesT[seriesT==-999]=np.nan
date1 = '2010-01-01'
date2 = '2010-09-01'
date3 = '2020-05-17'
date4 = '2020-12-31'
mydates = pd.date_range(date2, date3).tolist()
seriesT['Fecha'] = mydates
seriesT.set_index('Fecha',inplace=True) # Para que los índices sean fechas y así se ponen en el eje x de forma predeterminada
seriesT.index = seriesT.index.to_pydatetime()
df = seriesT.drop(seriesT.columns[[1, 2]], axis=1) # df.columns is zero-based pd.Index
n_input = 92
train, test = df[:-n_input], df[-n_input:]
scaler = MinMaxScaler()
scaler.fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
#n_input = 365
n_features = 1
generator = TimeseriesGenerator(train, train, length=n_input, batch_size=1)
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit_generator(generator,epochs=150)
#create an empty list for each of our 12 predictions
#create the batch that our model will predict off of
#save the prediction to our list
#add the prediction to the end of the batch to be used in the next prediction
pred_list = []
batch = train[-n_input:].reshape((1, n_input, n_features))
for i in range(n_input):
pred_list.append(model.predict(batch)[0])
batch = np.append(batch[:,1:,:],[[pred_list[i]]],axis=1)
df_predict = pd.DataFrame(scaler.inverse_transform(pred_list),
index=df[-n_input:].index, columns=['Prediction'])
df_test = pd.concat([df,df_predict], axis=1)
plt.figure(figsize=(20, 5))
plt.plot(df_test.index, df_test['Minimas'])
plt.plot(df_test.index, df_test['Prediction'], color='r')
plt.legend(loc='best', fontsize='xx-large')
plt.xticks(fontsize=18)
plt.yticks(fontsize=16)
plt.show()
如您所见,如果单击图像链接,我得到的预测太平滑了,很高兴看到季节性,但这并不是我所期待的。 此外,我尝试向所示的神经网络添加更多层,因此该网络看起来像:
#n_input = 365
n_features = 1
generator = TimeseriesGenerator(train, train, length=n_input, batch_size=1)
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(LSTM(128, activation='relu'))
model.add(LSTM(256, activation='relu'))
model.add(LSTM(128, activation='relu'))
model.add(LSTM(64, activation='relu'))
model.add(LSTM(n_features, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit_generator(generator,epochs=100)
但我收到此错误:
ValueError :输入0与lstm_86层不兼容:预期ndim = 3,找到ndim = 2
当然,由于模型的性能较差,因此我无法保证样本外预测的准确性。 为什么我不能在网络上实现更多层?如何改善性能?
答案 0 :(得分:0)
您缺少一个参数:return_sequences。
当您具有多个LSTM层时,应将其设置为TRUE。因为否则,该层将仅输出最后的隐藏状态。将其添加到每个LSTM层。
model.add(LSTM(128, activation='relu', return_sequences=True))
关于性能不佳:我猜这是因为该应用程序的数据量少(数据看起来很杂乱),添加图层不会有太大帮助。