我遇到一个问题,我想预测一个时间序列和多个时间序列。我的输入是(batch_size, time_steps, features)
,而我的输出应该是(1, time_steps, features)
我不知道如何对N求平均值。
这是一个虚拟的例子。首先,虚拟数据的输出是200个时间序列的线性函数:
import numpy as np
time = 100
N = 2000
dat = np.zeros((N, time))
for i in range(time):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
y = dat.T @ np.random.normal(size = N)
现在,我将定义一个时间序列模型(使用一维转换网络):
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda
from keras.optimizers import Adam
from keras import backend as K
n_filters = 2
filter_width = 3
dilation_rates = [2**i for i in range(5)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Out[43]: (2000, 100, 1)
输出的形状错误!接下来,我尝试使用平均层,但是出现了这个奇怪的错误:
def av_over_batches(x):
x = K.mean(x, axis = 0)
return(x)
x = Lambda(av_over_batches)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Traceback (most recent call last):
File "<ipython-input-3-d43ccd8afa69>", line 4, in <module>
model.predict(dat.reshape(N, time, 1)).shape
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 302, in predict_loop
outs[i][batch_start:batch_end] = batch_out
ValueError: could not broadcast input array from shape (100,1) into shape (32,1)
32
来自哪里? (顺便说一句,我在真实数据中得到了相同的数字,而不仅仅是在MWE中)。
但是主要问题是:我如何建立一个在输入批次维度上平均的网络?
答案 0 :(得分:0)
我将以另一种方式解决问题
问题:您想从一组时间序列中预测一个时间序列。因此,假设您要预测时间序列TS1, TS2, TS3
的100个时间步中有3个时间序列值y1, y2, y3
。
我针对此问题的处理方法如下
即将每个时间步的时间序列归为一组,并将其输入LSTM。如果某些时间步长短一些,那么您可以填补它们。同样,如果某些集合的时间序列较少,则再次填充它们。import numpy as np
np.random.seed(33)
time = 100
N = 5000
k = 5
magic = np.random.normal(size = k)
x = list()
y = list()
for i in range(N):
dat = np.zeros((k, time))
for i in range(k):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
x.append(dat)
y.append(dat.T @ magic)
所以我想从一组3次步骤中预测100个步骤的时间序列。我们希望模型学习magic
。
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda, LSTM
from keras.optimizers import Adam
from keras import backend as K
import matplotlib.pyplot as plt
input = Input(shape=(time, k))
lstm = LSTM(32, return_sequences=True)(input)
output = Dense(1,activation='sigmoid')(lstm)
model = Model(inputs = input, outputs = output)
model.compile(optimizer = Adam(), loss='mean_squared_error')
data_x = np.zeros((N,100,5))
data_y = np.zeros((N,100,1))
for i in range(N):
data_x[i] = x[i].T.reshape(100,5)
data_y[i] = y[i].reshape(100,1)
from sklearn.preprocessing import StandardScaler
ss_x = StandardScaler()
ss_y = StandardScaler()
data_x = ss_x.fit_transform(data_x.reshape(N,-1)).reshape(N,100,5)
data_y = ss_y.fit_transform(data_y.reshape(N,-1)).reshape(N,100,1)
# Lets leave the last one sample for testing rest split into train and validation
model.fit(data_x[:-1],data_y[:-1], batch_size=64, nb_epoch=100, validation_split=.25)
val损失仍在下降,但我停止了。让我们看看我们的预测有多好
y_hat = model.predict(data_x[-1].reshape(-1,100,5))
plt.plot(data_y[-1], label='y')
plt.plot(y_hat.reshape(100), label='y_hat')
plt.legend(loc='upper left')
结果令人鼓舞。将其运行更多的时间,以及进行超参数调整,应进一步使我们关闭magic
。人们还可以尝试堆叠式LSTM和双向LSTM。
我觉得RNN比CNN更适合时间序列数据
数据格式:
假设时间步长= 3
时间序列1 = [1,2,3]
时间序列2 = [4,5,6]
时间序列3 = [7,8,9]
时间序列3 = [10,11,12]
Y = [100,200,300]
批量为1
[[1,4,7,10],[2,5,8,11],[3,6,9,12]] -> LSTM -> [100,200,300]