我正在尝试将第k个动作数据集提供给cnn。我在重塑数据时遇到困难。我创建了这个数组(99,75,120,160)type = uint8,即99个属于一个类的视频,每个视频有75帧,每帧120x160尺寸。
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'),
input_shape=()))
###need to reshape data in input_shape
我应该先指定一个密集层吗?
这是我的代码
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'),
input_shape=(75,120,160)))
###need to reshape data in input_shape
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=64, return_sequences=True))
model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(1, (3,3), padding='same')))
model.compile(optimizer='adam', loss='mse')
data = np.load(r"C:\Users\shj_k\Desktop\Project\handclapping.npy")
print (data.shape)
(x_train,x_test) = train_test_split(data)
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print (x_train.shape)
print (x_test.shape)
model.fit(x_train, x_train,
epochs=100,
batch_size=1,
shuffle=False,
validation_data=(x_test, x_test))
变量是 x_test(25,75,120,160)type = float32 x_train(74,75,120,160)type = float32
评论中的一个完全错误是
runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py', wdir ='C:/ Users / shj_k / Desktop / Project')(99,75,120,160)(74,75, 120、160)(25、75、120、160)回溯(最近一次通话最近):
文件“”,第1行,在 运行文件('C:/Users/shj_k/Desktop/Project/cnn_lstm.py',wdir ='C:/ Users / shj_k / Desktop / Project')
文件 “ C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py”, 运行文件中的第668行 execfile(文件名,命名空间)
文件 “ C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py”, execfile中的第108行 exec(compile(f.read(),文件名,'exec'),命名空间)
文件“ C:/Users/shj_k/Desktop/Project/cnn_lstm.py”,第63行,在 validation_data =(x_test,x_test))
文件 “ C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py”, 线952,适合 batch_size =批量大小)
文件 “ C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py”, _standardize_user_data中的第751行 exception_prefix ='input')
文件 “ C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training_utils.py”, 第128行,位于standardize_input_data中 'with shape'+ str(data_shape))
ValueError:检查输入时出错:预期 time_distributed_403_input具有5个维度,但具有 形状(74、75、120、160)
谢谢您的回复
答案 0 :(得分:0)
几件事:
Keras中的TimeDistributed层需要一个时间维度,因此对于视频图像处理,此处可能是75(帧)。
它还希望图像以形状(120、60、3)发送。因此,TimeDistributed图层的input_shape应该为(75、120、160、3)。 3代表RGB通道。如果您有灰度图像,则最后一个尺寸应为1。
input_shape始终忽略示例的“行”维,在您的情况下为99。
要检查模型各层创建的输出形状,请在编译后放置model.summary()
。
请参阅:https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed
您可以使用Keras.preprocessing.image将图像转换为形状为(X,Y,3)的numpy数组。
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)
更新 : 似乎必须使所有图像平方(128,128,1)的原因是在model.fit()中,训练示例(x_train)和标签(通常是y_train)是同一组。如果您查看下面的模型摘要,则在“展平”层之后,所有内容都会变成正方形。因此,期望标签为正方形。这是有道理的:使用此模型进行预测会将(120,160,1)图像转换为形状(128、128、1)的图像。因此,将模型训练更改为以下代码应该有效:
x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (TimeDist (None, 5, 120, 160, 64) 320
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 60, 80, 64) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 60, 80, 32) 18464
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 30, 40, 32) 0
_________________________________________________________________
time_distributed_5 (TimeDist (None, 5, 30, 40, 16) 4624
_________________________________________________________________
time_distributed_6 (TimeDist (None, 5, 15, 20, 16) 0
_________________________________________________________________
time_distributed_7 (TimeDist (None, 5, 4800) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 5, 64) 1245440
_________________________________________________________________
time_distributed_8 (TimeDist (None, 5, 8, 8, 1) 0
_________________________________________________________________
time_distributed_9 (TimeDist (None, 5, 16, 16, 1) 0
_________________________________________________________________
time_distributed_10 (TimeDis (None, 5, 16, 16, 16) 160
_________________________________________________________________
time_distributed_11 (TimeDis (None, 5, 32, 32, 16) 0
_________________________________________________________________
time_distributed_12 (TimeDis (None, 5, 32, 32, 32) 4640
_________________________________________________________________
time_distributed_13 (TimeDis (None, 5, 64, 64, 32) 0
_________________________________________________________________
time_distributed_14 (TimeDis (None, 5, 64, 64, 64) 18496
_________________________________________________________________
time_distributed_15 (TimeDis (None, 5, 128, 128, 64) 0
_________________________________________________________________
time_distributed_16 (TimeDis (None, 5, 128, 128, 1) 577
=================================================================
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0
更新2 : 要使其在不更改y的情况下使用非正方形图像,请设置LSTM(300),Reshape(15、20、1),然后删除Conv2D +上采样层之一。然后,即使在自动编码器中,也可以使用形状为(120,160)的图片。技巧是查看模型摘要,并确保在LSTM之后以正确的形状开始,以便在添加所有其他层之后,最终结果是形状为(120,160)。
model = Sequential()
model.add(
TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences=True))
model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))
model.compile(optimizer='adam', loss='mse')
model.summary()
x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))
model.fit(x_train, y_train)
答案 1 :(得分:0)
感谢凯·艾伯利先生的协助。在将图像调整为128x128尺寸后,我能够运行模型。数据集的大小可能会导致在没有gpu的情况下导致系统崩溃。根据需要减小尺寸。如有疑问,请参阅整个评论部分。您可以在github
中找到代码here