我试图使用LSTM更加熟悉Keras中的时间序列预测功能。我试图在收集过去30天的价格数据后,预测交易所买卖基金(SPY)的收盘价。
下面是我原始数据集中称为“间谍”的前五行:
date open high low close
2008-06-25 131.72 133.40 131.24 131.81
2008-06-26 130.57 131.42 128.08 128.23
2008-06-27 128.28 128.86 127.04 127.53
2008-06-30 127.89 128.91 127.30 127.98
2008-07-01 126.52 128.47 125.93 128.38
现在,我使用Sklearn标准缩放器缩放数据并将其放回Pandas Dataframe中:
scaler = StandardScaler()
scaler.fit(spy)
spy = pd.DataFrame(scaler.transform(spy))
接下来,我使用以下函数创建X_test,X_train,y_test和y_train数据集;
def load_data(stock, seq_len):
amount_of_features = len(stock.columns)
data = stock.as_matrix()
sequence_length = seq_len + 1
result = []
for index in range(len(data) - sequence_length):
result.append(data[index: index + sequence_length])
result = np.array(result)
row = round(0.9 * result.shape[0])
train = result[:int(row), :]
x_train = train[:, :-1]
y_train = train[:, -1][:,-1]
x_test = result[int(row):, :-1]
y_test = result[int(row):, -1][:,-1]
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], amount_of_features))
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], amount_of_features))
return [x_train, y_train, x_test, y_test]
接下来,我创建数据集:
window = 30
X_train, y_train, X_test, y_test, train, result = load_data(spy, window)
现在定义了模型:
def build_model(layers):
d = 0.2
model = Sequential()
model.add(LSTM(128, input_shape=(layers[1], layers[0]), return_sequences=True))
model.add(Dropout(d))
model.add(LSTM(64, input_shape=(layers[1], layers[0]), return_sequences=False))
model.add(Dropout(d))
model.add(Dense(16,init='uniform',activation='relu'))
model.add(Dense(1,init='uniform',activation='linear'))
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
return model
现在已创建模型:
model = build_model([4,window])
然后将模型拟合到训练数据:
model.fit(
X_train,
y_train,
batch_size=512,
nb_epoch=200,
validation_split=0.1,
verbose=1)
这是我遇到的问题。在训练模型后检查性能时,会得到以下结果:
trainScore = model.evaluate(X_train, y_train, verbose=0)
testScore = model.evaluate(X_test, y_test, verbose=0)
Train Score: 0.00 MSE (0.07 RMSE)
Test Score: 0.29 MSE (0.54 RMSE)
我希望了解为什么结果这么差。我并不幻想该模型会非常精确,但是这些结果非常错误,导致我认为我犯了一个错误。当我绘制y_test与y_predicted时,预测值几乎是一条直线。任何帮助将不胜感激!