Question

我有三个图像的数据集。当我创建一个自动编码器以训练这三个图像时，我得到的输出对于每个图像都是完全相同的，并且看起来像是所有三个图像的混合。

我的结果如下：

输入图像1：

输出图像1：

输入图片2：

输出图像2：

输入图片3：

输出图像3：

因此，您可以看到输出为每个输入提供了完全相同的东西，并且虽然每个输入都匹配得很好，但这并不完美。

这是一个包含三个图像的数据集-应该是完美的（或者每个图像至少要不同）。

我担心这三个图像数据集，因为当我处理500个图像数据集时，我得到的只是一个白色的空白屏幕，因为那是所有图像中最好的平均值。

我正在使用Keras，代码真的很简单。

from keras.models                   import Sequential
from keras.layers                   import Dense, Flatten, Reshape
import numpy as np

# returns a numpy array with shape (3, 24, 32, 1)
# there are 3 images that are each 24x32 and are black and white (1 color channel)
x_train = get_data()

# this is the size of our encoded representations
# encode down to two numbers (I have tested using 3; I still have the same issue)
encoding_dim = 2
# the shape without the batch amount
input_shape = x_train.shape[1:]
# how many output neurons we need to create an image
input_dim = np.prod(input_shape)

# simple feedforward network
# I've also tried convolutional layers; same issue
autoencoder = Sequential([
              Flatten(), # flatten
              Dense(encoding_dim), # encode
              Dense(input_dim), # decode
              Reshape(input_shape) # reshape decoding
])

# adadelta optimizer works better than adam, same issue with both
autoencoder.compile(optimizer='adadelta', loss='mse')

# train it to output the same thing it gets as input
# I've tried epochs up to 30000 with no improvement;
# still predicts the same image for all three inputs
autoencoder.fit(x_train, x_train,
            epochs=10,
            batch_size=1,
            verbose=1)

out = autoencoder.predict(x_train)

然后我获取输出（out[0]，out[1]，out[2]）并将其转换回图像。您可以在上面看到输出图像。

我很担心，因为这表明自动编码器没有保留有关输入图像的任何信息，而不是编码器应如何执行。

如何使编码器根据输入图像显示输出的差异？

编辑：

我的一位同事建议不要使用自动编码器，而是使用1层前馈神经网络。我尝试了一下，然后发生了同样的事情，直到我将批处理大小设置为1并训练了1400个时期，然后它完美地工作了。这使我认为more epochs将解决此问题，但我不确定。

编辑：

训练10,000个时间段（批次大小为3）使第二个图像看起来与编码器上的第一个图像和第三个图像不同，这恰好是在非编码器版本上运行约400个时间段（也有批次）时发生的情况-size 3）提供进一步的证据，证明可能需要训练更多的纪元。

要使用批处理大小1进行测试，看看是否有更多帮助，然后尝试训练很多纪元，看看是否可以完全解决问题。

Answer 1

我的编码尺寸太小。尝试将24x32图像编码为2个（或3个数字）对于自动编码器来说实在太多了。

通过将a = Image.open('Data1' + '\\'+ imlist[0]) # open one image to get size提高到32，该问题已基本解决。我能够通过Adadelta优化程序使用默认的学习率。我的数据甚至不需要进行归一化（只需将所有像素除以255即可）。

尽管encoding_dim（均方误差）工作得很好，但"binary_crossentropy"损失函数似乎比"mse"更快/更好。

在前几百个时期中，看起来确实像是在混合图像。但是，随着训练时间的延长，它会开始分离得更多。

我还使编码层的输出激活为"mse"，而解码层的激活为relu。我不确定对输出有多少影响-我尚未测试。

This page帮助了大量了解我做错了什么。我只是复制/粘贴了代码，发现它对我的数据集有效，因此其余人员都在弄清楚我做错了什么。

以下是它们在我的数据集上运行的简单自动编码器体系结构的一些图片（这是我的初衷）：

500个纪元：

2000个纪元：

如何使自动编码器在小型图像数据集上工作

1 个答案: