我正在尝试在Keras中建立一个卷积神经网络,该网络在中间结合一个卷积LSTM层(ConvLSTM2D),以处理来自视频的一系列GREYSCALE图像。每个帧的形状为(61,61,1),这些序列的序列一起传递,因此总输入为(num_movies,num_frames,frame_height,frame_width,1)。卷积层经过预训练以按顺序自动编码图像,因此剩下的就是训练循环LSTM层。当我只有一个过滤器时,网络工作正常。这是网络(LSTM层以外的所有层都是TimeDistributed)。网络的目标是在给定当前帧的情况下预测视频中的未来帧(即按顺序通过,将序列向前移动n帧)。我分别训练了卷积部分(自动编码器;请参见下面的代码)以自动编码各个帧,现在尝试添加LSTM来进行序列预测。
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (ZeroPad) (None, 34, 64, 64, 1) 0
_________________________________________________________________
time_distributed_2 (Conv2D) (None, 34, 64, 64, 16) 160
_________________________________________________________________
time_distributed_3 (MaxPool) (None, 34, 32, 32, 16) 0
_________________________________________________________________
time_distributed_4 (Conv2D) (None, 34, 32, 32, 8) 1160
_________________________________________________________________
time_distributed_5 (MaxPool) (None, 34, 16, 16, 8) 0
_________________________________________________________________
time_distributed_6 (Conv2D) (None, 34, 16, 16, 8) 584
_________________________________________________________________
time_distributed_7 (MaxPool) (None, 34, 8, 8, 8) 0
_________________________________________________________________
time_distributed_8 (Conv2D) (None, 34, 8, 8, 8) 584
_________________________________________________________________
time_distributed_9 (MaxPool) (None, 34, 4, 4, 8) 0
_________________________________________________________________
time_distributed_10 (Conv2D) (None, 34, 4, 4, 1) 73
_________________________________________________________________
rnn (ConvLSTM2D) (None, 34, 4, 4, 1) 36
_________________________________________________________________
time_distributed_11 (Conv2D) (None, 34, 4, 4, 4) 40
_________________________________________________________________
time_distributed_12 (UpSample) (None, 34, 8, 8, 4) 0
_________________________________________________________________
time_distributed_13 (Conv2D) (None, 34, 8, 8, 8) 296
_________________________________________________________________
time_distributed_14 (UpSample) (None, 34, 16, 16, 8) 0
_________________________________________________________________
time_distributed_15 (Conv2D) (None, 34, 16, 16, 8) 584
_________________________________________________________________
time_distributed_16 (UpSample) (None, 34, 32, 32, 8) 0
_________________________________________________________________
time_distributed_17 (Conv2D) (None, 34, 32, 32, 16) 1168
_________________________________________________________________
time_distributed_18 (UpSample) (None, 34, 64, 64, 16) 0
_________________________________________________________________
time_distributed_19 (Conv2D) (None, 34, 64, 64, 1) 145
_________________________________________________________________
time_distributed_20 (UpSample) (None, 34, 61, 61, 1) 0
=================================================================
Total params: 4,830
Trainable params: 36
Non-trainable params: 4,794
_________________________________________________________________
当ConvLSTM2D层中的过滤器数为1时,一切都会运行。但是,当我尝试调整ConvLSTM2D层中的过滤器数量时,出现错误:
“ ValueError:输入通道数与过滤器的相应尺寸不匹配,7!= 1”
其中7是我要使用的过滤器数量。
我在这里构建自动编码器。
autoencoder = Sequential()
autoencoder.add(ZeroPadding2D(((2,1),(2,1)), input_shape = image_shape)) # (61,61, 1) --> (64,64, 1)
autoencoder.add(Conv2D(16, (3,3), activation='relu', padding='same')) # (64, 64, 1) --> (64, 64, 16)
autoencoder.add(MaxPooling2D((2,2), padding='same')) # (64, 64, 16) --> (32, 32, 16)
autoencoder.add(Conv2D(8, (3,3), activation = 'relu', padding = 'same')) # (32,32, 16) --> (32,32, 8)
autoencoder.add(MaxPooling2D((2,2), padding = 'same')) # (32, 32, 8) --> (16,16, 8)
autoencoder.add(Conv2D(8, (3,3), activation = 'relu', padding = 'same')) # (16,16,8) --> (16,16,4)
autoencoder.add(MaxPooling2D((2,2), padding = 'same')) # (16, 16, 4) --> (4, 4, 4)
autoencoder.add(Conv2D(8, (3,3), activation = 'relu', padding = 'same')) # (16,16,8) --> (16,16,4)
autoencoder.add(MaxPooling2D((2,2), padding = 'same')) # (16, 16, 4) --> (4, 4, 4)
autoencoder.add(Conv2D(1, (3,3), activation = 'relu', padding = 'same')) # (4,4,4) --> (4,4,1)
# decode the lower dimensional representation back into an image
model.add(Conv2D(4,(3,3), activation = 'relu', padding = 'same')) # (4,4,1) --> (4,4,4)
autoencoder.add(UpSampling2D((2,2))) # (4,4,4) -> (16,16,4)
autoencoder.add(Conv2D(8,(3,3), activation = 'relu', padding = 'same')) # (4,4,1) --> (4,4,4)
autoencoder.add(UpSampling2D((2,2))) # (4,4,4) -> (16,16,4)
autoencoder.add(Conv2D(8,(3,3), activation = 'relu', padding = 'same')) # (16,16,4) --> (16,16,8)
autoencoder.add(UpSampling2D((2,2))) # (16,16,8) -> (32,32,8)
autoencoder.add(Conv2D(16, (3,3), activation = 'relu', padding='same'))# (32,32,8) -> (32,32,16)
autoencoder.add(UpSampling2D((2,2))) # (32,32,16) -> (64,64,16)
# want to use sigmoid as our final activation function to make the output more
autoencoder.add(Conv2D(1, (3,3), activation='sigmoid', padding='same')) # (64,64,16) -> (64,64, 1)
autoencoder.add(Cropping2D(((2,1),(2,1)))) # (64,64,1) -> (61,61, 1)
一旦构建了自动编码器,我就会在自动编码任务上对其进行训练,并添加LSTM层。
num_layers = len(autoencoder.layers)
model = Sequential()
for i in range(num_layers // 2):
model.add(TimeDistributed(autoencoder.layers[i]))
out_shape = autoencoder.layers[num_layers//2 - 1].output_shape
# Convolutional LSTM
num_filters = 7
kernel_shape = (2,2)
model.add(ConvLSTM2D(filters=num_filters,
kernel_size=kernel_shape,
activation='tanh',
padding='same',
return_sequences = True,
name='rnn'))
''' #SimpleRNN
model.add(TimeDistributed(Flatten()))
model.add(SimpleRNN(rnn_size,
return_sequences = True,
activation = 'tanh',
name='rnn'))
# NOTE: since the RNN changes size of output of the final Conv2D layer in encoding section, we somehow have
# to map the dimension back down. This is what the Dense layer below does
model.add(TimeDistributed(Dense(out_shape[1] * out_shape[2], activation = 'relu', name = 'ff')))
model.add(TimeDistributed(Reshape((out_shape[1], out_shape[2], 1))))
'''
for i in range(num_layers//2, num_layers):
model.add(TimeDistributed(autoencoder.layers[i]))
# set non-reccurent layers to untrainable; we already trained these to be autoencoders, so the RNN
# just has to learn how to move the object in the low dimensional space
for layer in model.layers:
~ if not (layer.name == 'rnn' or layer.name == 'ff'):
layer.trainable = False
当我将过滤器的数量更改为除一个以外的其他数量时,我立即收到此错误:
ValueError:输入通道数与过滤器的相应尺寸不匹配,7!= 1
我不明白为什么滤波器的数量必须与输入通道的数量绑定在一起?我们不能在同一输入上有多个过滤器,每个过滤器具有不同的内核吗?
我尝试了一些常见的修复方法,例如设置'data_format = channels_last'