将CNN功能输入LSTM

时间:2017-04-28 13:00:16

标签: tensorflow keras lstm

我想建立一个具有以下特征的端到端可训练模型:

  • CNN从图像中提取要素
  • 将功能重新整形为矩阵
  • 然后将该矩阵的每一行馈送到LSTM1
  • 然后将该矩阵的每列送入LSTM2
  • LSTM1和LSTM2的输出连接为最终输出

(它或多或少类似于本文中的图2:https://arxiv.org/pdf/1611.07890.pdf

我现在的问题是重塑后,如何使用Keras或Tensorflow将特征矩阵的值提供给LSTM?

这是我目前使用VGG16网络的代码(也是指向Keras issues的链接):

# VGG16
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 2
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 3
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 4
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 5
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 6
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))

# reshape the  feature 4096 = 64 * 64
model.add(Reshape((64, 64)))

# How to feed each row of this to LSTM?
# This is my first solution but it doesn’t look correct: 
# model.add(LSTM(256, input_shape=(64, 1)))  # 256 hidden units, sequence length = 64, feature dim = 1

1 个答案:

答案 0 :(得分:0)

请考虑使用Conv2D和MaxPool2D层构建CNN模型,直到到达Flatten层为止,因为Flatten层的矢量化输出将是您向结构的LSTM部分输入数据。

因此,像这样构建您的CNN模型:

model_cnn = Sequential()
model_cnn.add(Conv2D...)
model_cnn.add(MaxPooling2D...)
...
model_cnn.add(Flatten())

现在,这很有趣,Keras的当前版本与某些TensorFlow结构不兼容,这些结构不允许您将整个层堆叠在一个顺序对象中。

因此是时候使用Keras模型对象通过一个技巧来完善您的神经网络了:

input_lay = Input(shape=(None, ?, ?, ?)) #dimensions of your data
time_distribute = TimeDistributed(Lambda(lambda x: model_cnn(x)))(input_lay) # keras.layers.Lambda is essential to make our trick work :)
lstm_lay = LSTM(?)(time_distribute)
output_lay = Dense(?, activation='?')(lstm_lay)

最后,现在是时候将我们两个分离的模型放在一起了:

model = Model(inputs=[input_lay], outputs=[output_lay])
model.compile(...)

OBS:请注意,一旦VGG Flatten层的矢量化输出将成为LSTM模型的输入,就可以用VGG代替我的model_cnn示例,而无需包含顶层。