Question

我正在尝试将CNN-LSTM网络与Keras结合使用，以便分析视频。我阅读了有关内容，并遇到了TimeDistributed函数和一些示例。

实际上，我尝试了以下所述的网络，它实际上是由卷积层和池化层组成，再由递归层和密集层组成。

model = Sequential()
model.add(TimeDistributed(Conv2D(2, (2,2), activation= 'relu' ), input_shape=(None, IMG_SIZE, IMG_SIZE, 3)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50))
model.add(Dense(50, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = ['acc'])

由于数据集太小，我没有正确测试模型。但是，在训练过程中，网络在4-5个时期内达到了准确度0.98 （也许是过度拟合，但这不是问题，因为我希望以后再获得合适的数据集）。

然后，我了解了如何使用预训练的卷积网络（MobileNet，ResNet或Inception）作为LSTM网络的特征提取器，因此我使用以下代码：

inputs = Input(shape = (frames, IMG_SIZE, IMG_SIZE, 3))
cnn_base = InceptionV3(include_top = False, weights='imagenet', input_shape = (IMG_SIZE, IMG_SIZE, 3))

cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(inputs=cnn_base.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(inputs)
encoded_sequence = LSTM(256)(encoded_frames)

hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(50, activation="softmax")(hidden_layer)
model = Model([inputs], outputs)

在这种情况下，训练模型时，它总是显示准确性〜0.02 （基线1/50）。

由于第一个模型至少学到了任何东西，所以我想知道第二种情况下网络的构建方式是否有错误。

有人遇到过这种情况吗？有什么建议吗？

谢谢。

Answer 1

原因是您的数据量非常小，并且需要重新训练完整的Inception V3权重。您必须使用更多的数据来训练模型，或者通过超参数调整来使用更多的时期来训练模型。您可以找到有关超参数训练here的更多信息。

理想的方法是通过base_model.trainable = False冻结基本模型，并仅训练在Inception V3层之上添加的新层。

或

解冻基础模型的顶层（Inception V3层），并将底层设置为不可训练。您可以按照以下步骤进行操作-

# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))

# Fine-tune from this layer onwards
fine_tune_at = 100

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
  layer.trainable =  False

如何使用Keras构建经过预训练的CNN-LSTM网络

1 个答案: