我正在使用this Kaggle guide进行时间序列预测(附加了示例数据)。
代码如下:
def create_dataset(dataset, window_size = 1):
data_X, data_Y = [], []
for i in range(len(dataset) - window_size - 1):
a = dataset[i:(i + window_size), 0]
data_X.append(a)
data_Y.append(dataset[i + window_size, 0])
return(np.array(data_X), np.array(data_Y))
def fit_model(train_X, train_Y, window_size = 1):
model = Sequential()
model.add(LSTM(4,
input_shape = (1, window_size)))
model.add(Dense(1))
model.compile(loss = "mean_squared_error",
optimizer = "adam")
model.fit(train_X,
train_Y,
epochs = 100,
batch_size = 1,
verbose = 0)
return(model)
def predict_and_score(model, X, Y):
# Make predictions on the original scale of the data.
pred = MinMaxScaler(feature_range = (0,1)).inverse_transform(model.predict(X))
# Prepare Y data to also be on the original scale for interpretability.
orig_data = MinMaxScaler(feature_range = (0,1)).inverse_transform([Y])
# Calculate RMSE.
score = math.sqrt(mean_squared_error(orig_data[0], pred[:, 0]))
return(score, pred)
这整个东西都在以下功能中使用:
def nnet(time_series, window_size=1, ):
cmi_total_raw = vstack((time_series.values.astype('float32')))
scaler = MinMaxScaler(feature_range = (0,1))
cmi_total_scaled = scaler.fit_transform(cmi_total_raw)
cmi_train_sc = (cmi_total_scaled[0:int(cmi_split*len(cmi_total_scaled))])
cmi_test_sc = cmi_total_scaled[int(cmi_split*len(cmi_total_scaled)) : len(cmi_total_scaled)]
# Create test and training sets for one-step-ahead regression.
window_size = 1
train_X, train_Y = create_dataset(cmi_train_sc, window_size)
test_X, test_Y = create_dataset(cmi_test_sc, window_size)
# Reshape the input data into appropriate form for Keras.
train_X = np.reshape(train_X, (train_X.shape[0], 1, train_X.shape[1]))
test_X = np.reshape(test_X, (test_X.shape[0], 1, test_X.shape[1]))
model = fit_model(train_X, train_Y, window_size)
rmse_train, train_predict = predict_and_score(nn_model, train_X, train_Y)
mape_test, test_predict = predict_and_score(model, test_X, test_Y)
return (mape_test, test_predict)
据我了解,它正在基于训练数据创建模型并根据样本内测试集进行预测,最后计算出误差。
输入数据有209行,我想预测下一行。
这是我尝试过的:
由于使用forecast(steps= n_steps)
方法在Auto-Arima中完成了相同的操作,因此我在Keras中寻找了类似的东西。
来自Keras documentation:
predict(x, batch_size=None, verbose=0, steps=None)
参数:
x: The input data, as a Numpy array (or list of Numpy arrays if the model has multiple inputs).
steps: Total number of steps (batches of samples) before declaring the prediction round finished. Ignored with the default value of None.
我尝试更改step
,它预测非常荒唐的值100,000。而且,test_predict
的长度离我给的steps
还远。因此,我假设step
在这里还有其他含义。
问题
-Keras甚至可以用来预测时间序列数据(样本外)
-如果是,是否有forecast
方法和上述predict
方法一样?
-如果否,是否可以以任何方式使用现有的predict
方法来摆脱样本预测?
样本数据(总cmi _):
2014-05-25 272.459887
2014-06-01 272.446022
2014-06-08 330.301260
2014-06-15 656.838394
2014-06-22 670.575110