我尝试使用变分自动编码器实现序列到序列模型,当我使用vae损失时,损失变得非常大,准确性也不高。我认为问题出在嵌入层,因为我发现的所有示例都没有使用它。
这是我的代码:
#==========================******* Encoder ********* =================================================================
encoder_inputs = Input(shape=(None,))
encoder_emb = Embedding(input_dim=vocab_in_size, output_dim=embedding_dim)
encoder_lstm =LSTM(units=units,return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_emb(encoder_inputs))
################## VAE ##################################
latent_dim =256
# output layer for mean and log variance
z_mu = Dense(2)(state_h) #remplacer h
z_log_var = Dense(2)(state_h)
def sampling(args):
batch = K.shape(z_mu)[0]
dim = K.int_shape(z_mu)[1]
print("kokooo",batch,dim)
z_mean, z_log_sigma = args
epsilon = K.random_normal(shape=(batch,dim),
mean=0., stddev=1.)
return z_mean + K.exp(z_log_sigma/2) * epsilon
# note that "output_shape" isn't necessary with the TensorFlow backend
# so you could write `Lambda(sampling)([z_mean, z_log_sigma])`
z = Lambda(sampling, output_shape=(2,))([z_mu, z_log_var])
################## Decoder #############################################
decoder_inputs = Input(shape=(None,))
decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim)(decoder_inputs)
state_1 = Dense(units)(z)
state_2 = Dense(units)(z)
decoder_lstm= LSTM(units=units, return_sequences=True)
decoder_out=decoder_lstm(decoder_emb,[state_1, state_2])
def vae_loss(x, x_decoded_mean):
xent_loss =K.sum(K.binary_crossentropy(x,x_decoded_mean))
print('xxx',xent_loss.shape)
#softmax_loss_function=softmax_loss_f), axis=-1)#, uncomment for sampled doftmax
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mu) - K.exp(z_log_var), axis=-1)
print("sososososoosososos",kl_loss.shape)
return xent_loss +kl_loss
print("eeennn",encoder_inputs.shape)
print("deccc",decoder_out.shape)
decoder_d2 = Dense(vocab_out_size, activation="softmax")
decoder_out = decoder_d2(decoder_out)
model = Model([encoder_inputs, decoder_inputs], decoder_out)
# We'll use sparse_categorical_crossentropy so we don't have to expand decoder_out into a massive one-hot array.
# Adam is used because it's, well, the best.
#"sparse_categorical_crossentropy"
model.compile(optimizer="rmsprop", loss=vae_loss, metrics=['sparse_categorical_accuracy'])
epochs =10
history = model.fit([input_data, teacher_data], target_data,
batch_size=BATCH_SIZE,
epochs=epochs,
validation_split=0.2)
这是训练后的结果:
Epoch 1/10
86874/86874 [==============================] - 496s 6ms/sample - loss: 48096715.1200 - sparse_categorical_accuracy: 0.0068 - val_loss: 47983217.0389 - val_sparse_categorical_accuracy: 0.0021
Epoch 2/10
68800/86874 [======================>.......] - ETA: 1:30 - loss: 48084318.8577 - sparse_categorical_accuracy: 0.0071
请问问题出在哪里?以及我该如何解决?