对于复杂的文本分类任务,我正在训练RNN模型。不知何故,RNN的权重仅在开始时变化,然后以非常小的步长变化(梯度在e ^ -7的范围内): 对于那些不熟悉张量板的人:这显示了偏差值和RNN权重的分布(x轴是值,y轴是值的数量,z是训练迭代)。
我构建了一个玩具示例,该示例没有意义,但重现了相同的行为:
import numpy as np
import tensorflow as tf
tensorboard_save_path = "../RNN/tensorboard/supersimple/"
x = np.random.normal(size=(33, 20, 5000))
y = np.array([1 if i>0.5 else 0 for i in np.random.random(33)])
##### NETWORK #########
with tf.name_scope("RNN"):
rnn_cell = tf.contrib.rnn.BasicRNNCell(1)
outputs, states = tf.nn.dynamic_rnn(rnn_cell, x, dtype=tf.float64)
rnn_weights, rnn_biases = rnn_cell.variables
tf.summary.histogram("RNN weights", rnn_weights)
tf.summary.histogram("RNN biases", rnn_biases)
pred = tf.sigmoid(outputs[:,-1])
with tf.name_scope("cost"):
cost = tf.losses.mean_squared_error(predictions=pred, labels=np.reshape(y, (33,1)))
with tf.name_scope("train"):
optimizer = tf.train.AdagradOptimizer(learning_rate=0.5).minimize(cost)
init = tf.global_variables_initializer()
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter(tensorboard_save_path)
print("\ttensorboard --logdir=" + tensorboard_save_path)
sess = tf.Session()
sess.run(init)
for i in range(1000):
sess.run([optimizer])
if i % 50 == 0:
c, s = sess.run([cost, merged_summary])
writer.add_summary(s, i)
print("cost is %f" % c)
预期的行为是,与少数训练样本相比,由于存在大量变量,模型会过度拟合。知道这里出了什么问题吗?