张量流中的apply_gradients()函数不会更新权重和偏差变量

时间:2017-12-08 05:39:27

标签: python machine-learning tensorflow neural-network deep-learning

我使用Tensorflow的compute_gradients()apply_gradients()函数进行反向支持。通过打印渐变值,我确实看到渐变得到了计算,但在调用apply_gradients()函数后,我看不到任何权重变化。我也没有看到global_step变量的值发生变化。

我做错了吗?

我在会话中运行以下代码,我确实看到从compute_gradients()函数返回的渐变值被打印出来。但是当我将(梯度,权重变量)元组的列表传递给apply_gradients()函数时,我没有看到权重值发生变化而且global_step值没有得到更新。

global_step = tf.Variable(0, trainable=False, dtype=tf.int32)
images = tf.placeholder(dtype=tf.float32, shape=[batch_size, None, None, 3])
out_locs = tf.placeholder(dtype=tf.float32, shape=[None, 2])
org_gt_coords = tf.placeholder(dtype=tf.float32, shape=[batch_size, 2])   

res_aux = inference(images,out_locs,org_gt_coords)

ret_dict = train(res_aux, global_step)

init = tf.global_variables_initializer()
with tf.Session() as sess:
  writer = tf.summary.FileWriter('./graphs', sess.graph)
  sess.run(init)

  for epoch in xrange(max_steps):
    start_time = time.time()
    anno_file_batch_rows = getImageMetaRecords() 
    print('epoch: ', epoch)

    for batch in xrange(len(anno_file_batch_rows)/batch_size):
      distorted_images, meta = cdhd_input.distorted_inputs(stats_dict, batch_size, \
              anno_file_batch_rows[batch * batch_size : (batch * batch_size) + batch_size])

      out_dict = sess.run(ret_dict, feed_dict=
                            {images: distorted_images, 
                            out_locs: meta['out_locs'],
                            org_gt_coords: meta['org_gt_coords']})

def inference(images,out_locs,org_gt_coords):
  # conv1
  with tf.variable_scope('conv1') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[3, 3, 3, 32],
                                         stddev=1,  #check if this is right
                                         wd=0.0)
    kernel = tf.multiply(kernel, 0.2722)        #line 321-325 in warpTrainCNNCDHDCentroidChainGridPredSharedRevFastExp3
    conv = tf.nn.conv2d(images, kernel, [1, 2, 2, 1], padding='VALID')
    biases = _variable_on_cpu('biases', [32], tf.constant_initializer(1.0))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(pre_activation, name=scope.name)

  # conv2
  with tf.variable_scope('conv2') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[3, 3, 32, 64],
                                         stddev=1,
                                         wd=0.0)
    kernel = tf.multiply(kernel, 0.0833)        #line 321-325 in warpTrainCNNCDHDCentroidChainGridPredSharedRevFastExp3
    conv = tf.nn.conv2d(conv1, kernel, [1, 2, 2, 1], padding='VALID')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(1.0))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(pre_activation, name=scope.name)

    ...
    ...
    more layers
    ...
    ...

    return res_aux

def train(res_aux, global_step):
    ...
    ...
    code here to process res_aux and calculate loss
    ...
    ...

    opt = tf.train.GradientDescentOptimizer(learning_rate=0.01) 
    grads_and_vars = opt.compute_gradients(loss, tf.get_collection('weights'))
    #printing shows real valued gradient and weight values
    apply_gradients(grads_and_vars, global_step=global_step)
    #printing same weight values shows no change in weight values. Gradients are not applied to the weights

1 个答案:

答案 0 :(得分:0)

此行仅定义应用渐变的op:

a_optimizer_col_2.apply_gradients(grad_var_2, global_step=global_step)

为了应用它,您应该在会话中运行此操作,如下所示:

...
train_step = a_optimizer_col_2.apply_gradients(grad_var_2, global_step=global_step)
...
with tf.Session() as sess:
  sess.run(train_step, feed_dict={...})