我正在实现一个自动编码器设置,其中编码器和解码器均为递归神经网络(RNN)。编码器和解码器模型的初始化如下:
def enc(message, weights, biases):
message = tf.unstack(message, 4, 1)
fw_cell = rnn.LSTMCell(num_hidden_enc)
with tf.variable_scope("encoder"):
outputs, _ = rnn.static_rnn(fw_cell, message, dtype=tf.float32)
return tf.matmul(outputs[-1], weights) + biases
def dec(codeword, weights, biases, time_steps):
codeword = tf.expand_dims(codeword, axis=2)
codeword = tf.unstack(codeword, 7, 1)
fw_cell = rnn.LSTMCell(num_hidden_dec)
with tf.variable_scope("decoder"):
outputs, _ = rnn.static_rnn(fw_cell, codeword, dtype=tf.float32)
a = tf.matmul(outputs[-1], weights) + biases
weight_fc = np.random.normal(loc=0.0, scale=0.01, size=[4, 4])
init = tf.constant_initializer(weight_fc)
return tf.layers.dense(a, units=4, activation=tf.nn.sigmoid, kernel_initializer=init)
我正在使用mean square error
损失函数和Adam
优化器。
# message_hat is the output of the decoder neural network, input_bits is the input to the encoder network
loss = tf.reduce_sum(0.5 * (tf.squeeze(input_bits) - message_hat) ** 2) / float(batch_size)
opt = tf.train.AdamOptimizer().minimize(loss)
当我在时间段0的末尾打印解码器网络的输出时,我得到了一些浮点值,但是当我再运行一个时间段时,所有值都默认为NaN
。编码器和解码器均是如此。所以我尝试使用以下代码进行梯度裁剪:
opt = tf.train.AdamOptimizer()
gvs = opt.compute_gradients(loss)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = opt.apply_gradients(capped_gvs)
仍然有同样的问题。更准确地说,这是从纪元1开始的输出:
Epoch: 1
Decoding loss: nan
Decoder output: [[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan]]
代码内在有问题吗?欢迎提出建议。预先感谢。