我正在基于示例学习Tensorflow建模LSTM。我遵循2年前TF 1.0的示例。我将成本函数定义为:
cost = tf.reduce_mean(tf.losses.absolute_difference(predictions=predictions, labels=target))
然后在训练周期中使用它:
for _batch in range(b_per_epoch):
batch_xs, batch_ys, leng = get_batch(_batch, batch_size_i, x_, y_)
# run batch
res = sess.run([optimizer, cost, grads, cost_summary],
feed_dict={input: batch_x,
target: batch_y,
lens: leng
})
# Calculate average
cum_cost += res[1]
train_cost = cum_cost / (_batch + 1)
这很好。然后,我尝试在每个批次的末尾验证与训练数据分开的数据:
# Test the validation sample in batches
for _batch_t in range(b_per_epoch_t):
test_xs_t, test_ys_t, leng_t = get_batch(_batch_t, batch_size_i, xt_, yt_)
# Then evaluate the loss
resu = sess.run([cost,
cost_val_summary],
feed_dict={input: test_xs_t,
target: test_ys_t,
lens: leng_t})
cum_cost_t += resu[0]
test_cost = cum_cost_t / (_batch_t + 1)
没有错误,但是我看到不一致的结果。 train_cost
从第0阶段结束时的0.65开始,到第30阶段下降到0.1,因此网络似乎可以学习。同时test_cost
在第0个阶段结束时以0.4开始,并且在所有第一个阶段始终保持在0.38-0.43的范围内。我相信这不是过度拟合,而是编码错误。我在做错什么吗?