Question

我对卷积LSTM网络还比较陌生，但是我目前正在研究一个涉及未来帧序列预测的问题，这就是为什么我决定研究ConvLSTM网络的原因。

为了了解模型的工作方式以及如何扩展模型，我尝试了一些关于移动MNIST数据集的初步测试： http://www.cs.toronto.edu/~nitish/unsupervised_video/mnist_test_seq.npy

但是，经过训练和推论后，我本以为预测会更加连贯，尤其是与其他对移动MNIST数据集使用类似方法的人进行比较时。看起来数字的原始轨迹已在输出中“保存”。

这是常见的限制吗？还是我的网络架构针对当前任务设计不正确？

设置

我已阅读并应用了以下文章中给出的代码： https://arxiv.org/abs/1506.04214

，他们也有一个Github页面，在这里我主要将其keras示例用于ConvLSTM单元： https://github.com/wqxu/ConvLSTM

我已将样本大小减小到100以供您复制结果-但是我使用K40 GPU训练了100个历时（大约一个小时）的模型，只是为了查看问题是否与模型完全不相关

我的代码如下（假设您已从上面的链接下载了Moving MNIST数据集，并将其放入了'path'变量中）：

from keras.models import Sequential
from keras.layers.convolutional import Conv3D
from keras.layers.convolutional_recurrent import ConvLSTM2D
from keras.layers.normalization import BatchNormalization
import numpy as np
import matplotlib.pyplot as plt

path = "./"
data = np.load(path + 'mnist_test_seq.npy')

# Define image dimensions and frames to be used for LSTM memory
sequence_length = 15 
image_height = data.shape[2]
image_width = data.shape[3]

# swap frames and observations so [obs, frames, height, width, channels]
data = data.swapaxes(0, 1)

# only select first 100 observations to reduce memory- and compute requirements
sub = data[:100, :, :, :]
# add channel dimension (grayscale)
sub = np.expand_dims(sub, 4)

# normalize to 0, 1
#sub = sub / 255
sub[sub < 128] = 0
sub[sub>= 128] = 1


# Define network

seq = Sequential()
seq.add(ConvLSTM2D(filters=64, kernel_size=(1,1),
                   input_shape=(None, image_height, image_width, 1), #Will need to change channels to 3 for real images
                   padding='same', return_sequences=True,
                   activation='relu'))
seq.add(BatchNormalization())
seq.add(ConvLSTM2D(filters=64, kernel_size=(2,2),
                   padding='same', return_sequences=True,
                   activation='relu'))
seq.add(BatchNormalization())
seq.add(ConvLSTM2D(filters=64, kernel_size=(1,1),
                   padding='same', return_sequences=True,
                   activation='relu'))
seq.add(BatchNormalization())
seq.add(ConvLSTM2D(filters=64, kernel_size=(2,2),
                   padding='same', return_sequences=True,
                   activation='relu'))
seq.add(BatchNormalization())
seq.add(Conv3D(filters=1, kernel_size=(1,1,1),
               activation='sigmoid',
               padding='same', data_format='channels_last'))
seq.compile(loss='binary_crossentropy', optimizer='adam')


# Add helper function for shifting input and output, so previous frame (X_t-1) is used as input to predict next frame (y_t)

def shift_data(data, n_frames=15):
    X = data[:, 0:n_frames, :, :, :]
    y = data[:, 1:(n_frames+1), :, :, :]
    return X, y


# Run script

# prepare X, y
X, y = shift_data(sub, sequence_length)

# fit the model
seq.fit(X, y, batch_size=16, epochs=100, validation_split=0.05)

# select a random observation
test_set = np.expand_dims(X[5, :, :, :, :], 0)


# compare to ground truth and visualize
for i in range(0, 13):
    # create plot
    fig = plt.figure(figsize=(10, 5))

    # truth
    ax = fig.add_subplot(122)
    ax.text(1, -3, ('ground truth at time :' + str(i)), fontsize=20, color='b')
    toplot_true = test_set[0, i, ::, ::, 0]
    plt.imshow(toplot_true)
    # predictions
    ax = fig.add_subplot(121)
    ax.text(1, -3, ('predicted frame at time :' + str(i)), fontsize=20, color='b')
    toplot_pred = prediction[0, i+1, ::, ::, 0]

    plt.imshow(toplot_pred)
    plt.savefig(path + '/%i_image.png' % (i + 1))

我得到的结果如下：

第一张图片看起来不错 Frame 1

但是，Frame 6和Frame 13框架清楚地向您显示了先前步骤的整个轨迹。

如果您一次可视化所有图像，它也将变得清晰，数字的轨迹也不会从图像中“去除”。

我不确定这是否只是模型的某些已知限制，还是仅仅是模型未收敛。我担心的是，鉴于数据集的相对简单性，这些结果不太令人满意，并且对于模型而言，更复杂的任务将根本不可行。任何反馈将不胜感激！

Keras中使用ConvLSTM的移动MNIST的非相干帧预测

设置

0 个答案: