我有950个培训视频样本和50个测试视频样本。每个视频样本具有10帧,并且每个帧的形状为(n_row = 28,n_col = 28,n_channels = 1)。我的输入(x)和输出(y)具有相同的形状。
x_train形状:(950,10,28,28,1),
y_train形状:(950,10,28,28,1),
x_test形状:(50,10,28,28,1),
y_test形状:(50、10、28、28,1)。
我想将输入视频样本(x)作为模型输入,以预测输出视频样本(y)。
到目前为止,我的模型是:
from keras.layers import Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
import numpy as np
########################################################################################
model = Sequential()
model.add(TimeDistributed(Convolution2D(16, (3, 3), padding='same'), input_shape=(None, 28, 28, 1)))
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.2))
model.add(TimeDistributed(Convolution2D(32, (3, 3), padding='same')))
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.2))
model.add(TimeDistributed(Convolution2D(64, (3, 3), padding='same')))
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(64, return_sequences=True, stateful=False))
model.add(LSTM(64, return_sequences=True, stateful=False))
model.add(Activation('sigmoid'))
model.add(Dense(784, activation='sigmoid'))
model.add(Reshape((-1, 28,28,1)))
model.compile(loss='mean_squared_error', optimizer='rmsprop')
print(model.summary())
该模型的摘要是:
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (TimeDist (None, None, 28, 28, 16) 160
_________________________________________________________________
activation_1 (Activation) (None, None, 28, 28, 16) 0
_________________________________________________________________
time_distributed_2 (TimeDist (None, None, 14, 14, 16) 0
_________________________________________________________________
dropout_1 (Dropout) (None, None, 14, 14, 16) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, None, 14, 14, 32) 4640
_________________________________________________________________
activation_2 (Activation) (None, None, 14, 14, 32) 0
_________________________________________________________________
time_distributed_4 (TimeDist (None, None, 7, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, None, 7, 7, 32) 0
_________________________________________________________________
time_distributed_5 (TimeDist (None, None, 7, 7, 64) 18496
_________________________________________________________________
activation_3 (Activation) (None, None, 7, 7, 64) 0
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 3, 3, 64) 0
_________________________________________________________________
time_distributed_7 (TimeDist (None, None, 576) 0
_________________________________________________________________
lstm_1 (LSTM) (None, None, 64) 164096
_________________________________________________________________
lstm_2 (LSTM) (None, None, 64) 33024
_________________________________________________________________
activation_4 (Activation) (None, None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, None, 784) 50960
_________________________________________________________________
reshape_1 (Reshape) (None, None, 28, 28, 1) 0
=================================================================
Total params: 271,376
Trainable params: 271,376
Non-trainable params: 0
我知道我的模型有问题,但是我不知道如何纠正它。
我想也许model.add(Reshape((-1,28,28,1)))
不能正常工作。老实说,我不知道如何处理model.add(Dense(784, activation='sigmoid'))
的输出。因此,我放置了一个Reshape图层以使其正确。
或者由于我的当前设计,LSTM
层可能无法正确检测时间相关性。
编辑1:
我将所有Convolution2D激活从sigmoid
更改为relu
。
这是更改后的模型的预测结果。如图所示,它目前无法做出合理的预测。
enter image description here
编辑2:
我将model.add(Reshape((-1, 28,28,1)))
更改为model.add(TimeDistributed(Reshape((28,28,1))))
,并将LSTM
单位增加为512
,并使用了两层LSTMs
。也使用BatchNormalization
并将input_shape
更改为(10, 28, 28, 1)
。通过使用此输入形状,我可以产生一个many to many
模型。
但是预测并没有太大变化。我认为我忽略了一些基本知识。这是新模型:
# from keras.layers import Dense, Dropout, Activation, LSTM
from keras.layers.normalization import BatchNormalization
from keras.layers import Lambda, Convolution2D, MaxPooling2D, Flatten, Reshape, Conv2D
from keras.layers.convolutional import Conv3D
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model
import keras.backend as K
import numpy as np
import pylab as plt
model = Sequential()
model.add(TimeDistributed(Convolution2D(16, (3, 3), activation='relu', kernel_initializer='glorot_uniform', padding='same'), input_shape=(10, 28, 28, 1)))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))
# extract features and dropout
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.3))
model.add(Dense(784, activation='linear'))
model.add(TimeDistributed(BatchNormalization()))
# input to LSTM
model.add(LSTM(units=512, activation='tanh', recurrent_activation='hard_sigmoid', kernel_initializer='glorot_uniform', unit_forget_bias=True, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))
model.add(LSTM(units=512, activation='tanh', recurrent_activation='hard_sigmoid', kernel_initializer='glorot_uniform', unit_forget_bias=True, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))
# classifier with sigmoid activation for multilabel
model.add(Dense(784, activation='linear'))
# model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Reshape((28,28,1))))
model.compile(loss='mae', optimizer='rmsprop')
print(model.summary())
编辑3: 因为ConvLSTM2D确实完成了我想要的事情,并且编写问题的目的是为了理解ConvLSTM2D,所以我更改了问题的标题,以便更好地演示我的问题。