具有3D卷积和卷积LSTM的自动编码器

时间:2019-07-23 16:51:30

标签: keras conv-neural-network lstm keras-layer autoencoder

我已经在编码器和解码器中实现了带有CNN层的变体自动编码器。代码如下所示。我的训练数据(train_X由40'000张尺寸为64 x 80 x 1的图像组成,而验证数据(valid_X)由4500张图像由尺寸为64 x 80 x 1的图像组成。

我想通过以下两种方式来适应我的网络:

  1. 不是使用2D卷积(Conv2D和Conv2DTranspose),而是想使用3D卷积来考虑时间(作为第三维)。为此,我想使用10张图片的切片,即我将获得尺寸为64 x 80 x 1 x 10的图片。我可以只使用Conv3D和Conv3DTranspose还是需要其他更改?

  2. 我想在编码器和解码器中尝试卷积LSTM(ConvLSTM2D),而不是普通的2D卷积。同样,图像的输入大小将是64 x 80 x 1 x 10(即10个图像的时间序列)。如何使我的网络适应ConvLSTM2D?

import keras
from keras import backend as K
from keras.layers import (Dense, Input, Flatten)
from keras.layers import Lambda, Conv2D
from keras.models import Model
from keras.layers import Reshape, Conv2DTranspose
from keras.losses import mse

def sampling(args):
    z_mean, z_log_var = args
    batch = K.shape(z_mean)[0]
    dim = K.int_shape(z_mean)[1]
    epsilon = K.random_normal(shape=(batch, dim))
    return z_mean + K.exp(0.5 * z_log_var) * epsilon

inner_dim = 16
latent_dim = 6

image_size = (64,78,1)
inputs = Input(shape=image_size, name='encoder_input')
x = inputs

x = Conv2D(32, 3, strides=2, activation='relu', padding='same')(x)
x = Conv2D(64, 3, strides=2, activation='relu', padding='same')(x)

# shape info needed to build decoder model
shape = K.int_shape(x)

# generate latent vector Q(z|X)
x = Flatten()(x)
x = Dense(inner_dim, activation='relu')(x)
z_mean = Dense(latent_dim, name='z_mean')(x)
z_log_var = Dense(latent_dim, name='z_log_var')(x)

z = Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var])

# instantiate encoder model
encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')

# build decoder model
latent_inputs = Input(shape=(latent_dim,), name='z_sampling')
x = Dense(inner_dim, activation='relu')(latent_inputs)
x = Dense(shape[1] * shape[2] * shape[3], activation='relu')(x)
x = Reshape((shape[1], shape[2], shape[3]))(x)

x = Conv2DTranspose(64, 3, strides=2, activation='relu', padding='same')(x)
x = Conv2DTranspose(32, 3, strides=2, activation='relu', padding='same')(x)

outputs = Conv2DTranspose(filters=1, kernel_size=3, activation='sigmoid', padding='same', name='decoder_output')(x)

# instantiate decoder model
decoder = Model(latent_inputs, outputs, name='decoder')

# instantiate VAE model
outputs = decoder(encoder(inputs)[2])
vae = Model(inputs, outputs, name='vae')

def vae_loss(x, x_decoded_mean):
    reconstruction_loss = mse(K.flatten(x), K.flatten(x_decoded_mean))
    reconstruction_loss *= image_size[0] * image_size[1]
    kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
    kl_loss = K.sum(kl_loss, axis=-1)
    kl_loss *= -0.5
    vae_loss = K.mean(reconstruction_loss + kl_loss)
    return vae_loss

optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.000)
vae.compile(loss=vae_loss, optimizer=optimizer)
vae.fit(train_X, train_X,
        epochs=500,
        batch_size=128,
        verbose=1,
        shuffle=True,
        validation_data=(valid_X, valid_X))

非常感谢您的帮助。我真的很感激。

1 个答案:

答案 0 :(得分:1)

将输入形状设置为(10, 64 , 80, 1),然后替换图层即可。

如果您要使用滑动窗口或只是将(images, 64,80,1)更改为(images//10, 10, 64,80,1),那么无聊的部分就是组织输入数据。

是否滑动窗口(重叠)?

1-好的...。如果您希望模型理解10张图像的各个部分,则可以重叠或不重叠。你的选择。重叠时性能可能会更好,但不一定。

只要按顺序排列10帧,图像中实际上就没有顺序。

Conv3DLSTMstateful=False支持。

2-但是,如果您希望模型理解整个序列,仅出于内存原因对序列进行划分,则只有LSTMstateful=True可以支持此操作。

Conv3D的内核大小为(frames, w, h)可以使用,但仅限于frames,永远不要理解比frames长的序列。它仍然可以检测到准时事件的存在,但不是长序列关系)

在这种情况下,对于LSTM,您需要:

  • 在培训中设置shuffle = False
  • 使用sequences的固定批次大小
  • 不重叠图像
  • 创建一个手动训练循环,每次您给出“新序列”进行训练和预测时,您就model.reset_states()进行

循环结构为:

for epoch in range(epochs):
    for group_of_sequences in range(groups):
        model.reset_states()

        sequences = getAGroupOfCompleteSequences() #shape (sequences, total_length, ....)            

        for batch in range(slide_divisions):
            batch = sequences[:,10*batch : 10*(batch+1)]

            model.train_on_batch(batch, ....)