我一直在尝试使用Keras实现规范化流程的简单版本,如本文所述:https://arxiv.org/pdf/1505.05770.pdf
我的问题是,损失总是无穷无尽的,我无法得到我做错的事。有谁能够帮我 ?
以下是程序:
编码器生成大小为latent_dim = 100
的向量。这些是z_mean, z_log_var, u, b, w
。
来自z_mean
和z_log_var
,使用重新参数化技巧,我可以对z_0
〜N(z_mean, z_log_var)
进行抽样。
然后我可以计算log(abs(1+u.T.dot(psi(z_0))))
然后我可以计算z_1
以下是这四个步骤的代码:
def sampling(args):
z_mean, z_log_var = args
# sample epsilon according to N(O,I)
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.,
std=epsilon_std)
# generate z0 according to N(z_mean, z_log_var)
z0 = z_mean + K.exp(z_log_var / 2) * epsilon
print('z0', z0)
return z0
def logdet_loss(args):
z0, w, u, b = args
b2 = K.squeeze(b, 1)
beta = K.sum(tf.multiply(w, z0), 1) # <w|z0>
linear_trans = beta + b2 # <w|z0> + b
# change u2 so that the transformation z0->z1 is invertible
alpha = K.sum(tf.multiply(w, u), 1) #
diag1 = tf.diag(K.softplus(alpha) - 1 - alpha)
u2 = u + K.dot(diag1, w) / K.sum(K.square(w)+1e-7)
gamma = K.sum(tf.multiply(w,u2), 1)
logdet = K.log(K.abs(1 + (1 - K.square(K.tanh(linear_trans)))*gamma) + 1e-6)
return logdet
def transform_z0(args):
z0, w, u, b = args
b2 = K.squeeze(b, 1)
beta = K.sum(tf.multiply(w, z0), 1)
# change u2 so that the transformation z0->z1 is invertible
alpha = K.sum(tf.multiply(w, u), 1)
diag1 = tf.diag(K.softplus(alpha) - 1 - alpha)
u2 = u + K.dot(diag1, w) / K.sum(K.square(w)+1e-7)
diag2 = tf.diag(K.tanh(beta + b2))
# generate z1
z1 = z0 + K.dot(diag2,u2)
return z1
然后是损失(上面定义了logdet
)
def vae_loss(x, x_decoded_mean):
xent_loss = K.mean(objectives.categorical_crossentropy(x, x_decoded_mean), -1)
ln_q0z0 = K.sum(log_normal2(z0, z_mean, z_log_var, eps=1e-6), -1)
ln_pz1 = K.sum(log_stdnormal(z1), -1)
result = K.mean(logdet + ln_pz1 + xent_loss - ln_q0z0)
return result
答案 0 :(得分:2)
我在这里修改了VAE上的Keras教程:https://github.com/sbaurdlp/keras-iaf-mnist
如果有人有兴趣看... 奇怪地添加更多层不会提高性能,我无法看到代码中的错误
答案 1 :(得分:0)
由于我无法工作,我试图实现this论文中描述的规范化流程:改进的变分推理 反向自回归流
然而,我仍然遇到了相同的分歧损失(朝向 - 无效),这没有任何意义。我的实施一定存在问题。
以下是重要部分:
# the encoder
h = encoder_block(x) # a convnet taking proteins as input (matrices of size 400x22), I don't describe it since it isn't very important
z_log_var = Dense(latent_dim)(h)
z_mean = Dense(latent_dim)(h)
h_ = Dense(latent_dim)(h)
encoder = Model(x, [z_mean,z_log_var, h_])
# the latent variables (only one transformation to keep it simple)
latent_input = Input(shape=(latent_dim, 2), batch_shape=(batch_size, latent_dim, 2))
hl = Convolution1D(1, filter_length, activation="relu", border_mode="same")(latent_input)
hl = Reshape((latent_dim,))(hl)
mean_1 = Dense(latent_dim)(hl)
std_1 = Dense(latent_dim)(hl)
latent_model = Model(latent_input, [mean_1, std_1])
# the decoder
decoder_input = Input((latent_dim,), batch_shape=(batch_size, latent_dim))
decoder=decoder_block() # a convnet that I don't describe
x_decoded_mean = decoder(decoder_input)
generator = Model(decoder_input, x_decoded_mean)
# the VAE
z_mean, z_log_var, other = encoder(vae_input)
eps = Lambda(sample_eps, name='sample_eps')([z_mean, z_log_var, other])
z0 = Lambda(sample_z0, name='sample_z0')([z_mean, z_log_var, eps])
l = Lambda(sample_l, name='sample_l')([eps, z_log_var])
mean, std = latent_model(merge([Reshape((latent_dim,1))(z0), Reshape((latent_dim,1))(other)], mode="concat", concat_axis=-1))
z = Lambda(transform_z0)([z0, mean, std])
l = Lambda(transform_l)([l, std])
x_decoded_mean = generator(z)
vae = Model(vae_input, x_decoded_mean)
# and here is the loss
def vae_loss(x, x_decoded_mean):
xent_loss = K.mean(objectives.categorical_crossentropy(x, x_decoded_mean), -1)
ln_q0z0 = K.sum(log_normal2(z0, z_mean, z_log_var), -1)
ln_pz1 = K.sum(log_stdnormal(z), -1)
result = K.mean(l + ln_pz1 + xent_loss - ln_q0z0)
return result
以下是我在Lambda
图层中使用的utils函数:
def sample_eps(args):
# sample epsilon according to N(O,I)
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.,
std=epsilon_std)
return epsilon
def sample_z0(args):
z_mean, z_log_var, epsilon = args
# generate z0 according to N(z_mean, z_log_var)
z0 = z_mean + K.exp(z_log_var / 2) * epsilon
return z0
def sample_l(args):
epsilon, z_log_var = args
l = -0.5*K.sum(z_log_var + epsilon**2 + K.log(2*math.pi), -1)
return l
def transform_z0(args):
z0, mean, std = args
z = z0
sig_std = K.sigmoid(std)
z *= sig_std
z += (1-sig_std)*mean
return z
def transform_l(args):
l, std = args
sig_std = K.sigmoid(std)
l -= K.sum(K.log(sig_std+1e-8), -1)
return l