张量流量损失不减少,重量梯度接近于零

时间:2017-06-01 13:26:39

标签: tensorflow deep-learning gradient loss face

最近,我正在研究面部对齐(面部标志检测),并想在开源助记符下降方法上做一些进一步的工作。根据代码,我对样本的导入做了一些修改。但是还有一些其他问题让我困惑了一段时间。

首先,模型如下

  patches = _extract_patches_module.extract_patches(images, tf.constant(patch_shape), inits+dx)
  patches = tf.stop_gradient(patches)
  patches = tf.reshape(patches, (batch_size, num_patches * patch_shape[0], patch_shape[1], num_channels))
  endpoints['patches'] = patches

  with tf.variable_scope('convnet', reuse=step>0):
      net = conv_model(patches)
      ims = net['concat']

  ims = tf.reshape(ims, (batch_size, -1))

  with tf.variable_scope('rnn', reuse=step>0) as scope:
      hidden_state = slim.ops.fc(tf.concat(1, [ims, hidden_state]), 512, activation=tf.tanh)
      prediction = slim.ops.fc(hidden_state, num_patches * 2, scope='pred', activation=None)
      endpoints['prediction'] = prediction`

并且conv_model是:

with tf.op_scope([inputs], scope, 'mdm_conv'):
with scopes.arg_scope([ops.conv2d, ops.fc], is_training=is_training):
  with scopes.arg_scope([ops.conv2d], activation=tf.nn.relu, padding='VALID'):
    net['conv_1'] = ops.conv2d(inputs, 32, [3, 3], scope='conv_1')
    net['pool_1'] = ops.max_pool(net['conv_1'], [2, 2])
    net['conv_2'] = ops.conv2d(net['pool_1'], 32, [3, 3], scope='conv_2')
    net['pool_2'] = ops.max_pool(net['conv_2'], [2, 2])

    crop_size = net['pool_2'].get_shape().as_list()[1:3]
    net['conv_2_cropped'] = utils.get_central_crop(net['conv_2'], box=crop_size)
    net['concat'] = tf.concat(3, [net['conv_2_cropped'], net['pool_2']])
    return net

初始学习率设置为 1e-3 ,批量大小 60

第一个问题是,当模型处于训练过程中时,损失几乎保持不变,即损失一般不会减少,即使迭代经历超过10,000步。这样的情况:

  2017-06-01 19:46:01.120850: step 3060, loss = 0.8852 (15.7 examples/sec; 3.830 sec/batch)
  2017-06-01 19:46:37.776494: step 3070, loss = 0.7375 (18.2 examples/sec; 3.291 sec/batch)
  2017-06-01 19:47:09.242257: step 3080, loss = 0.8160 (16.5 examples/sec; 3.635 sec/batch)
  2017-06-01 19:47:46.441860: step 3090, loss = 0.7973 (17.1 examples/sec; 3.501 sec/batch)
  2017-06-01 19:48:19.793012: step 3100, loss = 0.7228 (18.2 examples/sec; 3.292 sec/batch)
  2017-06-01 19:48:56.614480: step 3110, loss = 0.8687 (21.8 examples/sec; 2.750 sec/batch)
  2017-06-01 19:49:29.904451: step 3120, loss = 0.8662 (19.8 examples/sec; 3.024 sec/batch)
  2017-06-01 19:50:06.186441: step 3130, loss = 0.7927 (22.7 examples/sec; 2.648 sec/batch)
  2017-06-01 19:50:40.794964: step 3140, loss = 0.7585 (16.2 examples/sec; 3.711 sec/batch)
  2017-06-01 19:51:18.612637: step 3150, loss = 0.8264 (17.9 examples/sec; 3.348 sec/batch)
  2017-06-01 19:51:52.905742: step 3160, loss = 0.7504 (17.2 examples/sec; 3.498 sec/batch)
  2017-06-01 19:52:29.895365: step 3170, loss = 0.7569 (16.6 examples/sec; 3.615 sec/batch)
  2017-06-01 19:53:03.509374: step 3180, loss = 0.6869 (16.3 examples/sec; 3.692 sec/batch)
  2017-06-01 19:53:40.798535: step 3190, loss = 0.7592 (18.9 examples/sec; 3.180 sec/batch)
  2017-06-01 19:54:14.063566: step 3200, loss = 0.7689 (19.1 examples/sec; 3.136 sec/batch)
  2017-06-01 19:54:50.741630: step 3210, loss = 0.7345 (19.7 examples/sec; 3.040 sec/batch)

损失函数是:

def normalized_rmse(pred, gt_truth):
    norm = tf.sqrt(1e-12 + tf.reduce_sum(((gt_truth[:, 36, :] - gt_truth[:, 45, :])**2), 1))

    return tf.reduce_sum(tf.sqrt(1e-12 + tf.reduce_sum(tf.square(pred - gt_truth), 2)), 1) / (norm * 68)

实际上,训练数据集中有超过3,000张图像。在增强之后,从同一图像增强的每个图像在某些方面具有一些差异。因此,每批样品都不同。

然而,当我用一张图像训练模型时,模型可以在大约1,000步后收敛,即损失可以明显减少并接近于零,这真的让我感到困惑......

然后我使用tensorboard来显示结果。结果如下:

损失变化 the loss variation

权重和偏差的梯度 the gradients of the weights and bias

结果还表明损失剂量一般不会减少。同时,它揭示了第二个问题:偏差的梯度明显地在卷积模型中发生了变化,但是权重也是如此。保持不变!即使模型只训练了一个图像,经过1000步后可以收敛,卷积模型中相应的权重梯度也保持不变......

我是tensorflow的新手,我尽力解决这些问题,但最终失败了......所以我衷心希望大家能帮帮我......非常感谢你!

0 个答案:

没有答案