我有一个用于使用一次性数据构建的时间序列数据的keras lstm。.我停止了将其用于系统其他部分的工作,但是现在我将其用于生产,并且需要转换与可变数量的功能一起使用的方法。用户将以少量功能开始,然后逐渐添加它们。
[continous_value: 0.1, categoryA: 1]
[continous_value: 0.2, categoryA: 0, categoryB: 1]
[continous_value: 0.3, categoryA: 1, categoryB: 0]
...
大多数将“掉线”,因为它们不会重复,因此很容易通过随时间移动窗口来进行修剪-但有些重复是有规律的。我的lstm当前是围绕单个用户在窗口中的数据构建的。
每行都是15分钟的样本,而我的样本数据恰好具有2个连续特征和7个分类特征。我有14天的季节性(4 * 24 * 14 = 1344个时间步长),所以我一直在重采样为x:(1344, 14, 9)
和y:(1344, 9)
现在,为了允许模型适用于不同的用户,我开始添加“填充列”,但这并不理想:我必须猜测最大值将是多少,数字越大,预测性越差该模型是。
通过设置timestep = None(我相信x:(b, None, 9)
),Keras LSTM可以具有可变的特征计数,但是我看不到如何使用多元时间序列数据来实现。
我将如何更改它以正确生成数据?
# the memory of RNN depends on the number of timesteps you select
# if timesteps = n then the output depends on the previous n inputs
n = 14 # well over the weekly periodicity of the data
# Create input set that consists of n dimensions
# hence, the output of current day will be based on
# the prices of previous n days
len_train = 49 * DAYS
len_test = 7 * DAYS #(4 * 24)
train, test, endog, exog = dataForTest(len_train=len_train, len_test=len_test, offset=3*DAYS)
#print(train.iloc[-1],'\n',test.iloc[-1])
print(endog, exog)
dim = len(endog) + len(exog)
window = len_test # dim * 100 #using dim ensures it is reshapable dividing by dim
X_train = []
y_train = []
for i in range(n, n+(window)):
X_train.append(train[i - n: i].values)
y_train.append(train[i:i+1].values)
X_train, y_train = np.array(X_train), np.array(y_train)
y_train = y_train.reshape((window,dim))
print(X_train.shape, y_train.shape)
(batch_size, timesteps, dim) = X_train.shape
# Initialise Sequential model
regressor = Sequential()
# units is the output dimensionality
# return sequences will return the sequence
# which will be required to the next LSTM
# as a great big rule-o-thumb, layers should be less than 10, and perhaps 1 per endog plus 1 for all exog
# also see: https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw/1097#1097
alphaNh = len(columns) if len(columns) < 10 else 10 # 2-10, with 2 or 5 being common
nh = int(batch_size/(alphaNh*2*len(series.columns)))
dropout = 0.2
print('nh', nh)
# input shape will need only the last 2 dimensions
# of your input
################# 1st layer #######################
regressor.add(LSTM(units=nh, return_sequences=True, stateful=True, batch_size=batch_size,
input_shape=(timesteps, dim)))
# add Dropout to do regulariztion
# standard practise to use 20%
# regressor.add(Dropout(dropout))
layers = (len(endog) + 1) if len(endog) > 1 else 2
print('layers', layers)
for i in range(1, layers):
# After the first time, it's not required to
# specify the input_shape
################# layer #######################
# if i > 5:
# break
if i < layers - 1:
cell = LSTM(units=nh, return_sequences=True, stateful=True, batch_size=batch_size)
else:
cell = LSTM(units=nh, stateful=True, batch_size=batch_size)
regressor.add(cell)
################# Dropout layer #################
# After training layers we use some dropout.
# another option is to put this after each dim
# layer (above)
#
# standard practise to use 20%
regressor.add(Dropout(dropout))
################# Last layer ####################
# Last layer would be the fully connected layer,
# or the Dense layer
#
# The last word will predict a single number
# hence units=1
regressor.add(Dense(units=dim))
# Compiling the RNN
# The loss function for classification problem is
# cross entropy, since this is a regression problem
# the loss function will be mean squared error
regressor.compile(optimizer='adam', loss='mean_squared_error')
### src: https://keras.io/callbacks/
#saves the model weights after each epoch if the validation loss decreased
###
checkpointer = ModelCheckpoint(filepath='weights.hdf5', verbose=1, monitor='loss', mode='min', save_best_only=True)