无法使用反卷积层训练VAE

时间:2018-09-03 09:11:22

标签: python tensorflow neural-network artificial-intelligence conv-neural-network

我正在Tensorflow中针对MNIST数据集进行VAE实施实验。首先,我训练了基于MLP编码器和解码器的VAE。它训练得很好,损耗减少了,并且产生了看起来合理的数字。这是此基于MLP的VAE的解码器的代码:

x = sampled_z
x = tf.layers.dense(x, 200, tf.nn.relu)
x = tf.layers.dense(x, 200, tf.nn.relu)
x = tf.layers.dense(x, np.prod(data_shape))
img = tf.reshape(x, [-1] + data_shape)

下一步,我决定添加卷积层。仅更改编码器就可以很好地工作,但是当我在解码器中使用反卷积(而不是fc层)时,我根本不会接受任何培训。损失函数从不减少,输出始终为黑色。这是反卷积解码器的代码:

x = tf.layers.dense(sampled_z, 24, tf.nn.relu)
x = tf.layers.dense(x, 7 * 7 * 64, tf.nn.relu)
x = tf.reshape(x, [-1, 7, 7, 64])
x = tf.layers.conv2d_transpose(x, 64, 3, 2, 'SAME', activation=tf.nn.relu)
x = tf.layers.conv2d_transpose(x, 32, 3, 2, 'SAME', activation=tf.nn.relu)
x = tf.layers.conv2d_transpose(x, 1, 3, 1, 'SAME', activation=tf.nn.sigmoid)
img = tf.reshape(x, [-1, 28, 28])

这似乎很奇怪,代码对我来说似乎还不错。我将其范围缩小到解码器中的反卷积层,其中有些东西破坏了它。例如。如果在最后一次反卷积之后添加了一个完全连接的层(即使没有非线性!),它也会再次起作用!这是代码:

x = tf.layers.dense(sampled_z, 24, tf.nn.relu)
x = tf.layers.dense(x, 7 * 7 * 64, tf.nn.relu)
x = tf.reshape(x, [-1, 7, 7, 64])
x = tf.layers.conv2d_transpose(x, 64, 3, 2, 'SAME', activation=tf.nn.relu)
x = tf.layers.conv2d_transpose(x, 32, 3, 2, 'SAME', activation=tf.nn.relu)
x = tf.layers.conv2d_transpose(x, 1, 3, 1, 'SAME', activation=tf.nn.sigmoid)
x = tf.contrib.layers.flatten(x)
x = tf.layers.dense(x, 28 * 28)
img = tf.reshape(x, [-1, 28, 28])

在这一点上,我真的有点卡住了,没人知道这里会发生什么吗?我使用tf 1.8.0,Adam优化器,1e-4的学习率。

编辑:

正如@Agost指出的那样,我也许应该澄清有关我的损失功能和训练过程的事情。我将后验模型建模为伯努利分布,并将ELBO最大化作为损失。受this帖子的启发。这是编码器,解码器和损失的完整代码:

def make_prior():
    mu = tf.zeros(N_LATENT)
    sigma = tf.ones(N_LATENT)
    return tf.contrib.distributions.MultivariateNormalDiag(mu, sigma)


def make_encoder(x_input):
    x_input = tf.reshape(x_input, shape=[-1, 28, 28, 1])
    x = conv(x_input, 32, 3, 2)
    x = conv(x, 64, 3, 2)
    x = conv(x, 128, 3, 2)
    x = tf.contrib.layers.flatten(x)
    mu = dense(x, N_LATENT)
    sigma = dense(x, N_LATENT, activation=tf.nn.softplus)  # softplus is log(exp(x) + 1)
    return tf.contrib.distributions.MultivariateNormalDiag(mu, sigma)    


def make_decoder(sampled_z):
    x = tf.layers.dense(sampled_z, 24, tf.nn.relu)
    x = tf.layers.dense(x, 7 * 7 * 64, tf.nn.relu)
    x = tf.reshape(x, [-1, 7, 7, 64])

    x = tf.layers.conv2d_transpose(x, 64, 3, 2, 'SAME', activation=tf.nn.relu)
    x = tf.layers.conv2d_transpose(x, 32, 3, 2, 'SAME', activation=tf.nn.relu)
    x = tf.layers.conv2d_transpose(x, 1, 3, 1, 'SAME')

    img = tf.reshape(x, [-1, 28, 28])

    img_distribution = tf.contrib.distributions.Bernoulli(img)
    img = img_distribution.probs
    img_distribution = tf.contrib.distributions.Independent(img_distribution, 2)
    return img, img_distribution


def main():
    mnist = input_data.read_data_sets(os.path.join(experiment_dir(EXPERIMENT), 'MNIST_data'))

    tf.reset_default_graph()

    batch_size = 128

    x_input = tf.placeholder(dtype=tf.float32, shape=[None, 28, 28], name='X')

    prior = make_prior()
    posterior = make_encoder(x_input)

    mu, sigma = posterior.mean(), posterior.stddev()

    z = posterior.sample()
    generated_img, output_distribution = make_decoder(z)

    likelihood = output_distribution.log_prob(x_input)
    divergence = tf.distributions.kl_divergence(posterior, prior)
    elbo = tf.reduce_mean(likelihood - divergence)
    loss = -elbo

    global_step = tf.train.get_or_create_global_step()
    optimizer = tf.train.AdamOptimizer(1e-3).minimize(loss, global_step=global_step)

1 个答案:

答案 0 :(得分:2)

是在最终的deconv层中使用sigmoid,将输出限制为0-1,还是在基于MLP的自动编码器中不这样做,或者在deconv之后添加完全连接,从而可能出现数据范围问题?