我想训练神经网络对简单视频进行分类。我的方法是使用CNN,其输出连接到RNN(LSTM)。尝试将两者连接在一起时遇到麻烦。
X_train.shape
(2400, 256, 256, 3)
Y_train.shape
(2400, 6)
这是我定义的网络
model = Sequential()
model.add(Conv2D(32 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (256,256,3)))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(64 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(128 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(256 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Flatten())
model.add(layers.LSTM(64, return_sequences=True, input_shape=(1,256)))
model.add(layers.LSTM(32, return_sequences=True))
model.add(layers.LSTM(32))
model.add(layers.Dense(6, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
我收到以下错误
ValueError: Input 0 of layer lstm_7 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 65536]
我觉得它与RNN的输入形状有关。目的是让CNN掌握帧的特征,然后让RNN掌握帧之间的高级差异。在两个完全不同的网络上这样做会更好吗?如果可以,我该如何实现?而且由于其数据量很大,因此还可以用大量数据训练两个网络。
答案 0 :(得分:1)
您说得很对。在tensorflow中,LSTM需要输入形状为(batch_size, time_steps, embedding_size)
的输入,有关更多详细信息,请参见example。根据您的情况,尝试使用model.add(Reshape((16, 16*256)))
代替model.add(Flatten())
。不是最漂亮的解决方案,但它可以让您测试事物。
答案 1 :(得分:1)
问题是传递到LSTM的数据,可以在您的网络内部解决。它期望3D,并且与Flatten一起销毁它。您可以采用两种可能性:1)重塑(batch_size, H, W*channel)
; 2)(batch_size, W, H*channel)
。这样,您就可以在LSTM中使用3D数据。下面的例子
model = Sequential()
model.add(Conv2D(32 , (3,3) , strides = 1 , padding = 'same' ,
activation = 'relu' , input_shape = (256,256,3)))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(64 , (3,3) , strides = 1 , padding = 'same' ,
activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(128 , (3,3) , strides = 1 , padding = 'same' ,
activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(256 , (3,3) , strides = 1 , padding = 'same' ,
activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
def ReshapeLayer(x):
shape = x.shape
# 1 possibility: H,W*channel
reshape = Reshape((shape[1],shape[2]*shape[3]))(x)
# 2 possibility: W,H*channel
# transpose = Permute((2,1,3))(x)
# reshape = Reshape((shape[1],shape[2]*shape[3]))(transpose)
return reshape
model.add(Lambda(ReshapeLayer))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))
model.add(Dense(6, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.summary()