我已经在多批次的7个样本(每个样本具有3个特征)上训练了一个LSTM模型(使用Keras和TF构建),其形状类似于以下样本(以下数字仅是占位符,用于解释),每批次标记为0或1:
数据:
[
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
...
]
即:一批m个序列,每个序列的长度为7,其元素为3维向量(所以该批具有形状(m * 7 * 3))
目标:
[
[1]
[0]
[1]
...
]
在我的生产环境中,数据是具有3个特征([1,2,3],[1,2,3]...
)的样本流。我想在每个样本到达模型时流式传输,并获得中间概率,而不必等待整个批次(7)-请参见下面的动画。
我的想法之一是将缺少的样本填充到批次中,
[[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]]
,但效率似乎很低。
我们将不胜感激,这将为我提供一个正确的方向,既可以持续保存LSTM中间状态,又可以等待下一个样本,并预测使用部分数据在特定批次大小下训练的模型。
更新,包括型号代码:
opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
model = Sequential()
num_features = data.shape[2]
num_samples = data.shape[1]
first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='tanh')
model.add(
first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(LSTM(16, return_sequences=True, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=opt,
metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])
模型摘要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100, 32) 6272
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 100, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 100, 32) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 100, 16) 3136
_________________________________________________________________
dropout_2 (Dropout) (None, 100, 16) 0
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 100, 16) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 1600) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 1601
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________
答案 0 :(得分:4)
如果我正确理解,您将有一批m
序列,每个序列的长度为7,其元素是3维向量(因此,批次的形状为(m*7*3)
)。
在任何Keras RNN中,您都可以设置
return_sequences
的标志True
成为中间状态,即对于每批,而不是确定的预测,您将获得相应的7个输出,其中输出i
代表阶段的预测i
给出了从0到i
的所有输入。
但是最后您将一次获得全部。据我所知, Keras没有提供直接接口来检索批处理过程中的吞吐量。如果您使用任何CUDNN
优化的变体,则可能会受到更大的限制。您基本上可以做的是将您的批次视为7个连续的(m*1*3)
形状的批次,并将它们逐步喂入LSTM ,记录每个步骤的隐藏状态和预测。为此,您可以将return_state
设置为True
并手动进行操作,也可以将stateful
设置为True
并让对象跟踪它。>
以下Python2 + Keras示例应准确表示您想要的内容。具体来说:
为此,它包括一个stateful=True
(用于最简单的训练)和return_state=True
(用于最精确的推理)的示例,因此您可以同时使用这两种方法。它还假设您获得的模型已经序列化,并且您对其了解不多。该结构与吴安德(Andrew Ng)课程中的结构紧密相关,在该主题上,他肯定比我更有权威。由于您没有指定模型的训练方式,因此我采用了多对一的训练设置,但这很容易适应。
from __future__ import print_function
from keras.layers import Input, LSTM, Dense
from keras.models import Model, load_model
from keras.optimizers import Adam
import numpy as np
# globals
SEQ_LEN = 7
HID_DIMS = 32
OUTPUT_DIMS = 3 # outputs are assumed to be scalars
##############################################################################
# define the model to be trained on a fixed batch size:
# assume many-to-one training setup (otherwise set return_sequences=True)
TRAIN_BATCH_SIZE = 20
x_in = Input(batch_shape=[TRAIN_BATCH_SIZE, SEQ_LEN, 3])
lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, stateful=True)
dense = Dense(OUTPUT_DIMS, activation='linear')
m_train = Model(inputs=x_in, outputs=dense(lstm(x_in)))
m_train.summary()
# a dummy batch of training data of shape (TRAIN_BATCH_SIZE, SEQ_LEN, 3), with targets of shape (TRAIN_BATCH_SIZE, 3):
batch123 = np.repeat([[1, 2, 3]], SEQ_LEN, axis=0).reshape(1, SEQ_LEN, 3).repeat(TRAIN_BATCH_SIZE, axis=0)
targets = np.repeat([[123,234,345]], TRAIN_BATCH_SIZE, axis=0) # dummy [[1,2,3],,,]-> [123,234,345] mapping to be learned
# train the model on a fixed batch size and save it
print(">> INFERECE BEFORE TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
m_train.compile(optimizer=Adam(lr=0.5), loss='mean_squared_error', metrics=['mae'])
m_train.fit(batch123, targets, epochs=100, batch_size=TRAIN_BATCH_SIZE)
m_train.save("trained_lstm.h5")
print(">> INFERECE AFTER TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
##############################################################################
# Now, although we aren't training anymore, we want to do step-wise predictions
# that do alter the inner state of the model, and keep track of that.
m_trained = load_model("trained_lstm.h5")
print(">> INFERECE AFTER RELOADING TRAINED MODEL:", m_trained.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
# now define an analogous model that allows a flexible batch size for inference:
x_in = Input(shape=[SEQ_LEN, 3])
h_in = Input(shape=[HID_DIMS])
c_in = Input(shape=[HID_DIMS])
pred_lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, return_state=True, name="lstm_infer")
h, cc, c = pred_lstm(x_in, initial_state=[h_in, c_in])
prediction = Dense(OUTPUT_DIMS, activation='linear', name="dense_infer")(h)
m_inference = Model(inputs=[x_in, h_in, c_in], outputs=[prediction, h,cc,c])
# Let's confirm that this model is able to load the trained parameters:
# first, check that the performance from scratch is not good:
print(">> INFERENCE BEFORE SWAPPING MODEL:")
predictions, hs, zs, cs = m_inference.predict([batch123,
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
batch_size=1)
print(predictions)
# import state from the trained model state and check that it works:
print(">> INFERENCE AFTER SWAPPING MODEL:")
for layer in m_trained.layers:
if "lstm" in layer.name:
m_inference.get_layer("lstm_infer").set_weights(layer.get_weights())
elif "dense" in layer.name:
m_inference.get_layer("dense_infer").set_weights(layer.get_weights())
predictions, _, _, _ = m_inference.predict([batch123,
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
batch_size=1)
print(predictions)
# finally perform granular predictions while keeping the recurrent activations. Starting the sequence with zeros is a common practice, but depending on how you trained, you might have an <END_OF_SEQUENCE> character that you might want to propagate instead:
h, c = np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)), np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))
for i in range(len(batch123)):
# about output shape: https://keras.io/layers/recurrent/#rnn
# h,z,c hold the network's throughput: h is the proper LSTM output, c is the accumulator and cc is (probably) the candidate
current_input = batch123[i:i+1] # the length of this feed is arbitrary, doesn't have to be 1
pred, h, cc, c = m_inference.predict([current_input, h, c])
print("input:", current_input)
print("output:", pred)
print(h.shape, cc.shape, c.shape)
raw_input("do something with your prediction and hidden state and press any key to continue")
由于我们有两种形式的状态持久性:
1.每个序列的模型保存/训练参数相同
2. a
,c
状态在整个序列中不断发展,并可能“重新启动”
看看LSTM对象的内在是很有趣的。在我提供的Python示例中,a
和c
权重得到了明确处理,但训练后的参数却没有得到处理,并且它们如何在内部实现或它们的含义可能并不明显。可以按以下方式检查它们:
for w in lstm.weights:
print(w.name, w.shape)
在我们的情况下(32个隐藏状态)返回以下内容:
lstm_1/kernel:0 (3, 128)
lstm_1/recurrent_kernel:0 (32, 128)
lstm_1/bias:0 (128,)
我们观察到128维。为什么? this link描述了Keras LSTM实现,如下所示:
g是循环激活,p是激活,Ws是内核,Us是循环内核,h是隐藏变量,也是输出,符号*是逐元素乘法。
以下解释,128=32*4
是在4个门中的每个门内发生仿射变换的参数,这些门串联在一起:
(3, 128)
的矩阵(名为kernel
)处理给定序列元素的输入(32, 128)
的矩阵(命名为recurrent_kernel
)处理最后一个循环状态h
的输入。(128,)
的向量(命名为bias
),与其他任何NN设置一样。答案 1 :(得分:4)
我认为可能会有一个更简单的解决方案。
如果模型没有卷积层或作用于长度/步长维度的任何其他层,则只需将其标记为stateful=True
Flatten
层将长度尺寸转换为特征尺寸。这将完全阻止您实现目标。如果Flatten
层需要7个步骤,则始终需要7个步骤。
因此,在应用下面的我的答案之前,请修复模型以不使用Flatten
层。相反,它只能删除最后 LSTM层的return_sequences=True
。
以下代码修复了该问题,并准备了一些与下面的答案一起使用的东西:
def createModel(forTraining):
#model for training, stateful=False, any batch size
if forTraining == True:
batchSize = None
stateful = False
#model for predicting, stateful=True, fixed batch size
else:
batchSize = 1
stateful = True
model = Sequential()
first_lstm = LSTM(32,
batch_input_shape=(batchSize, num_samples, num_features),
return_sequences=True, activation='tanh',
stateful=stateful)
model.add(first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
#this is the last LSTM layer, use return_sequences=False
model.add(LSTM(16, return_sequences=False, stateful=stateful, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
#don't add a Flatten!!!
#model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
if forTraining == True:
compileThisModel(model)
有了这个,您将能够以7个步骤进行训练并以一个步骤进行预测。否则将不可能。
首先,再次训练此新模型,因为它没有Flatten层:
trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)
现在,使用此训练模型,您可以完全按照创建训练模型的相同方式创建一个新模型,但是在其所有LSTM层中标记stateful=True
。我们应该从训练好的模型中复制权重。
由于这些新层将需要固定的批处理大小(Keras的规则),因此我假设它将为1(一个单一的数据流,而不是m个数据流),并将其添加到上面的模型创建中。
predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())
瞧瞧。只需一步就可以预测模型的输出:
pseudo for loop as samples arrive to your model:
prob = predictingModel.predict_on_batch(sample)
#where sample.shape == (1, 1, 3)
当您确定到达连续序列的末尾时,请调用predictingModel.reset_states()
,这样您就可以安全地启动一个新序列,而模型无需考虑应在前一个序列的末尾进行修改。 / p>
只需获取并设置它们,并使用h5py保存:
def saveStates(model, saveName):
f = h5py.File(saveName,'w')
for l, lay in enumerate(model.layers):
#if you have nested models,
#consider making this recurrent testing for layers in layers
if isinstance(lay,RNN):
for s, stat in enumerate(lay.states):
f.create_dataset('states_' + str(l) + '_' + str(s),
data=K.eval(stat),
dtype=K.dtype(stat))
f.close()
def loadStates(model, saveName):
f = h5py.File(saveName, 'r')
allStates = list(f.keys())
for stateKey in allStates:
name, layer, state = stateKey.split('_')
layer = int(layer)
state = int(state)
K.set_value(model.layers[layer].states[state], f.get(stateKey))
f.close()
import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K
def createModel():
inp = Input(batch_shape=(1,None,3))
out = LSTM(5,return_sequences=True, stateful=True)(inp)
out = LSTM(2, stateful=True)(out)
out = Dense(1)(out)
model = Model(inp,out)
return model
def saveStates(model, saveName):
f = h5py.File(saveName,'w')
for l, lay in enumerate(model.layers):
#if you have nested models, consider making this recurrent testing for layers in layers
if isinstance(lay,RNN):
for s, stat in enumerate(lay.states):
f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))
f.close()
def loadStates(model, saveName):
f = h5py.File(saveName, 'r')
allStates = list(f.keys())
for stateKey in allStates:
name, layer, state = stateKey.split('_')
layer = int(layer)
state = int(state)
K.set_value(model.layers[layer].states[state], f.get(stateKey))
f.close()
def printStates(model):
for l in model.layers:
#if you have nested models, consider making this recurrent testing for layers in layers
if isinstance(l,RNN):
for s in l.states:
print(K.eval(s))
model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states
print('model1')
printStates(model1)
print('model2')
printStates(model2)
saveStates(model1,'testStates5')
loadStates(model2,'testStates5')
print('model1')
printStates(model1)
print('model2')
printStates(model2)
在您的第一个模型中(如果它是stateful=False
),它认为m
中的每个序列都是独立的,并且彼此之间没有联系。它还认为每个批次都包含唯一的序列。
如果不是这种情况,那么您可能想训练有状态模型(考虑到每个序列实际上都连接到先前的序列)。然后,您将需要m
个批次的1个序列。 -> m x (1, 7 or None, 3)
。
答案 2 :(得分:2)
注意:此答案假设您处于训练阶段的模型没有状态。您必须了解什么是有状态RNN层,并确保训练数据具有相应的有状态属性。简而言之,这意味着序列之间存在依赖性,即,一个序列是对另一个序列的跟踪,您需要在模型中考虑该序列。如果您的模型和训练数据是有状态的,那么我认为从一开始就为RNN层设置stateful=True
的其他答案会更简单。
更新:无论训练模型是否为有状态的,您始终可以将其权重复制到推理模型并启用有状态性。因此,我认为基于设置stateful=True
的解决方案比我的解决方案更短,更好。他们唯一的缺点是这些解决方案中的批次大小必须固定。
请注意,LSTM层在单个序列上的输出取决于其权重矩阵,该权重矩阵是固定的,并且其内部状态取决于先前处理的时间步长。现在,要获取长度为m
的单个序列的LSTM层的输出,一种显而易见的方法是一次性将整个序列馈送到LSTM层。但是,正如我前面所述,由于其内部状态取决于上一个时间步长,因此我们可以利用这一事实,并通过在处理块结束时获取LSTM层的状态并将其传递给LSTM来逐个馈送单个序列块。用于处理下一个块的层。为了更加清楚,假设序列长度为7(即,它具有7个时间步长的固定长度特征向量)。例如,可以像下面这样处理该序列:
C1
)。C1
作为初始状态输入LSTM层;获取最终状态(将其称为C2
)。C2
作为初始状态输入LSTM层;得到最终的输出。如果我们一次将全部7个时间步都输入了LSTM层,则最终输出将等于LSTM层产生的输出。
因此,要在Keras中实现此目的,可以将LSTM层的return_state
参数设置为True
,以便获得中间状态。此外,在定义输入层时不要指定固定的时间步长。而是使用None
来为模型提供任意长度的序列,这使我们能够逐步处理每个序列(如果您在训练时间内输入的数据是固定长度的序列就可以了)。
由于在推理时需要这种卡盘处理能力,因此我们需要定义一个新模型,该模型共享训练模型中使用的LSTM层,并且可以将初始状态作为输入,并将得到的状态作为输出。以下是可以完成的概述(请注意,训练模型时不使用LSTM层的返回状态,我们只需要在测试时间内使用它即可):
# define training model
train_input = Input(shape=(None, n_feats)) # note that the number of timesteps is None
lstm_layer = LSTM(n_units, return_state=True)
lstm_output, _, _ = lstm_layer(train_input) # note that we ignore the returned states
classifier = Dense(1, activation='sigmoid')
train_output = classifier(lstm_output)
train_model = Model(train_input, train_output)
# compile and fit the model on training data ...
# ==================================================
# define inference model
inf_input = Input(shape=(None, n_feats))
state_h_input = Input(shape=(n_units,))
state_c_input = Input(shape=(n_units,))
# we use the layers of previous model
lstm_output, state_h, state_c = lstm_layer(inf_input,
initial_state=[state_h_input, state_c_input])
output = classifier(lstm_output)
inf_model = Model([inf_input, state_h_input, state_c_input],
[output, state_h, state_c]) # note that we return the states as output
现在,您可以提供inf_model
尽可能多的序列时间步长。但是,请注意,最初您必须向状态提供全零的向量(这是状态的默认初始值)。例如,如果序列长度为7,则在有新数据流可用时会发生以下情况的草图:
state_h = np.zeros((1, n_units,))
state_c = np.zeros((1, n_units))
# three new timesteps are available
outputs = inf_model.predict([timesteps, state_h, state_c])
out = output[0,0] # you may ignore this output since the entire sequence has not been processed yet
state_h = outputs[0,1]
state_c = outputs[0,2]
# after some time another four new timesteps are available
outputs = inf_model.predict([timesteps, state_h, state_c])
# we have processed 7 timesteps, so the output is valid
out = output[0,0] # store it, pass it to another thread or do whatever you want to do with it
# reinitialize the state to make them ready for the next sequence chunk
state_h = np.zeros((1, n_units))
state_c = np.zeros((1, n_units))
# to be continued...
当然,您需要以某种循环方式执行此操作或实现控制流结构来处理数据流,但我认为您已大致了解了这一点。
最后,尽管您的具体示例不是序列到序列模型,但是我强烈建议您阅读official Keras seq2seq tutorial,我认为它可以从中学到很多想法。
答案 3 :(得分:0)
据我所知,由于Tensorflow中的静态图,没有有效的方法来馈送与训练输入长度不同长度的输入。
填充是解决此问题的官方方法,但是效率和内存消耗较低。我建议您研究Pytorch,这对于解决您的问题将是微不足道的。
有很多great posts可以使用Pytorch构建lstm,一旦看到动态图,您将了解它的好处。