Adam Optimizer面临断言不是np.isnan(loss_value),“模型因损耗= NaN而发散”错误

时间:2019-04-16 18:22:04

标签: python-3.x tensorflow

我正在使用VGG16架构和自适应学习率。当我使用adagradadadelta优化器时,我的代码运行良好。但是,当我使用adam优化器时会出现问题。我的代码在30或40个纪元后生成assert not np.isnan(loss_value), 'Model diverged with loss = NaN' error。我试图最小化学习速度(虽然不是必需的),批处理大小,增加的epsilon值等,但这只是增加了时期数,但没有解决我的问题。 我检查了StackOverflow中的上一条帖子,并尝试通过添加一些具有log值的分数来更改损失函数,但是没有用。我正在使用此{{ 3}}结构。


def loss(logits, labels):
  hazard_ratio = tf.exp(logits)
  cumsum = tf.cumsum(hazard_ratio) 
  likelihood = tf.log(tf.gather(cumsum, tf.reshape(labels, [-1]))) + tf.reduce_max(logits) 
  diff = tf.subtract(logits,likelihood)
  num = tf.reshape(diff, [-1]) * tf.cast(labels, tf.float32)
  reduce = - (tf.reduce_sum(num))
  #reduce = - (tf.reduce_mean(num))
  tf.add_to_collection('losses', reduce)
  return reduce, tf.add_n(tf.get_collection('losses'))


def _add_loss_summaries(total_loss):
  """Add summaries for losses in CIFAR-10 model.
  Generates moving average for all losses and associated summaries for
  visualizing the performance of the network.
    total_loss: Total loss from loss().
    loss_averages_op: op for generating moving averages of losses.
  # Compute the moving average of all individual losses and the total loss.
  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
  losses = tf.get_collection('losses')
  loss_averages_op = loss_averages.apply(losses + [total_loss])

  # Attach a scalar summary to all individual losses and the total loss; do the
  # same for the averaged version of the losses.
  for l in losses + [total_loss]:
    # Name each loss as '(raw)' and name the moving average version of the loss
    # as the original loss name.
    tf.summary.scalar( + ' (raw)', l)
    tf.summary.scalar(, loss_averages.average(l))

  return loss_averages_op


  File "", line 726, in <module>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/", line 125, in run
  File "", line 722, in main
  File "", line 626, in train
    assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
AssertionError: Model diverged with loss = NaN

0 个答案:
