我正在训练模型以使用RNN模型预测时间序列。这个模型训练没有任何问题。这是原始代码:
tf.reset_default_graph()
num_inputs = 1
num_neurons = 100
num_outputs = 1
learning_rate = 0.0001
num_train_iterations = 2000
batch_size = 1
X = tf.placeholder(tf.float32, [None, time_steps-1, num_inputs])
y = tf.placeholder(tf.float32, [None, time_steps-1, num_outputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
tf.contrib.rnn.BasicRNNCell(num_units=num_neurons, activation=tf.nn.relu),
output_size=num_outputs)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.75)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(init)
for iteration in range(num_train_iterations):
elx,ely = next_batch(training_data, time_steps)
sess.run(train, feed_dict={X: elx, y: ely})
if iteration % 100 == 0:
mse = loss.eval(feed_dict={X: elx, y: ely})
print(iteration, "\tMSE:", mse)
当我将tf.contrib.rnn.BasicRNNCell
更改为tf.contrib.rnn.BasicLSTMCell
时,问题出现了,速度和损失函数(MSE
变量变为NAN
)的速度大幅下降。我最好的选择是MSE
是不正确的损失函数,我应该尝试交叉熵。我搜索了类似的代码,发现tf.nn.softmax_cross_entropy_with_logits()
可能是解决方案,但仍然不明白如何在我的问题中实现它。
答案 0 :(得分:0)
通常" NAN"当你的渐变爆炸时发生。 这是tf.softmax的一些代码。试一试。
#Output Layer
logit = tf.add(tf.matmul(H1,w2),b2)
cross_entropy =
tf.nn.softmax_cross_entropy_with_logits(logits=logit,labels=Y)
#Cost
cost = (tf.reduce_mean(cross_entropy))
#Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
#Prediction
y_pred = tf.nn.softmax(logit)
pred = tf.argmax(y_pred, axis=1 )