我正在尝试跟随这个RNN tutorial on medium,随着我的进行重构。当我运行我的代码时,它似乎工作,但是当我试图打印出current state
变量以查看神经网络内部发生了什么时,我得到了所有1
个。这是预期的行为吗?该州是否由于某种原因没有更新?根据我的理解,current state
应该包含所有批次的隐藏层中的最新值,因此它绝对不应该是所有1
。任何帮助都将受到高度赞赏。
def __train_minibatch__(self, batch_num, sess, current_state):
"""
Trains one minibatch.
:type batch_num: int
:param batch_num: the current batch number.
:type sess: tensorflow Session
:param sess: the session during which training occurs.
:type current_state: numpy matrix (array of arrays)
:param current_state: the current hidden state
:type return: (float, numpy matrix)
:param return: (the calculated loss for this minibatch, the updated hidden state)
"""
start_index = batch_num * self.settings.truncate
end_index = start_index + self.settings.truncate
batch_x = self.x_train_batches[:, start_index:end_index]
batch_y = self.y_train_batches[:, start_index:end_index]
total_loss, train_step, current_state, predictions_series = sess.run(
[self.total_loss_fun, self.train_step_fun, self.current_state, self.predictions_series],
feed_dict={
self.batch_x_placeholder:batch_x,
self.batch_y_placeholder:batch_y,
self.hidden_state:current_state
})
return total_loss, current_state, predictions_series
# End of __train_minibatch__()
def __train_epoch__(self, epoch_num, sess, current_state, loss_list):
"""
Trains one full epoch.
:type epoch_num: int
:param epoch_num: the number of the current epoch.
:type sess: tensorflow Session
:param sess: the session during training occurs.
:type current_state: numpy matrix
:param current_state: the current hidden state.
:type loss_list: list of floats
:param loss_list: holds the losses incurred during training.
:type return: (float, numpy matrix)
:param return: (the latest incurred lost, the latest hidden state)
"""
self.logger.info("Starting epoch: %d" % (epoch_num))
for batch_num in range(self.num_batches):
# Debug log outside of function to reduce number of arguments.
self.logger.debug("Training minibatch : ", batch_num, " | ", "epoch : ", epoch_num + 1)
total_loss, current_state, predictions_series = self.__train_minibatch__(batch_num, sess, current_state)
loss_list.append(total_loss)
# End of batch training
self.logger.info("Finished epoch: %d | loss: %f" % (epoch_num, total_loss))
return total_loss, current_state, predictions_series
# End of __train_epoch__()
def train(self):
"""
Trains the given model on the given dataset, and saves the losses incurred
at the end of each epoch to a plot image.
"""
self.logger.info("Started training the model.")
self.__unstack_variables__()
self.__create_functions__()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
loss_list = []
current_state = np.zeros((self.settings.batch_size, self.settings.hidden_size), dtype=float)
for epoch_idx in range(1, self.settings.epochs + 1):
total_loss, current_state, predictions_series = self.__train_epoch__(epoch_idx, sess, current_state, loss_list)
print("Shape: ", current_state.shape, " | Current output: ", current_state)
# End of epoch training
self.logger.info("Finished training the model. Final loss: %f" % total_loss)
self.__plot__(loss_list)
self.generate_output()
# End of train()
更新
完成second part of the tutorial并使用内置的RNN API后,问题就消失了,这意味着我使用current_state
变量的方式有问题,或者对tensorflow API有所改变引起了一些古怪的事情(我很确定它是前者,但是)。如果有人有明确的答案,请打开问题。
答案 0 :(得分:0)
首先,你应该确保"它似乎有用"是的,你的测试错误确实越来越低了。
我的假设是,最后一批在最后被零损坏,因为数据total_series_length / batch_size
的长度不是truncated_backprop_length
的倍数。 (我没有检查它确实发生了它是否填充了零。教程中的代码太旧了,无法在我的tf版本上运行,我们也没有你的代码。)这个最后的小批量只有零结束可能导致最后的current_state
收敛到所有的current_state
。在任何其他小批量current_state
上都不会是全部。
每次在sess.run
中运行__train_minibatch__
时,您都可以尝试打印"Ph_gUFAC1083"
。或者也许每1000个迷你批次打印一次。