我使用Tensorflow的compute_gradients()
和apply_gradients()
函数进行反向支持。通过打印渐变值,我确实看到渐变得到了计算,但在调用apply_gradients()
函数后,我看不到任何权重变化。我也没有看到global_step变量的值发生变化。
我做错了吗?
我在会话中运行以下代码,我确实看到从compute_gradients()
函数返回的渐变值被打印出来。但是当我将(梯度,权重变量)元组的列表传递给apply_gradients()
函数时,我没有看到权重值发生变化而且global_step
值没有得到更新。
global_step = tf.Variable(0, trainable=False, dtype=tf.int32)
images = tf.placeholder(dtype=tf.float32, shape=[batch_size, None, None, 3])
out_locs = tf.placeholder(dtype=tf.float32, shape=[None, 2])
org_gt_coords = tf.placeholder(dtype=tf.float32, shape=[batch_size, 2])
res_aux = inference(images,out_locs,org_gt_coords)
ret_dict = train(res_aux, global_step)
init = tf.global_variables_initializer()
with tf.Session() as sess:
writer = tf.summary.FileWriter('./graphs', sess.graph)
sess.run(init)
for epoch in xrange(max_steps):
start_time = time.time()
anno_file_batch_rows = getImageMetaRecords()
print('epoch: ', epoch)
for batch in xrange(len(anno_file_batch_rows)/batch_size):
distorted_images, meta = cdhd_input.distorted_inputs(stats_dict, batch_size, \
anno_file_batch_rows[batch * batch_size : (batch * batch_size) + batch_size])
out_dict = sess.run(ret_dict, feed_dict=
{images: distorted_images,
out_locs: meta['out_locs'],
org_gt_coords: meta['org_gt_coords']})
def inference(images,out_locs,org_gt_coords):
# conv1
with tf.variable_scope('conv1') as scope:
kernel = _variable_with_weight_decay('weights',
shape=[3, 3, 3, 32],
stddev=1, #check if this is right
wd=0.0)
kernel = tf.multiply(kernel, 0.2722) #line 321-325 in warpTrainCNNCDHDCentroidChainGridPredSharedRevFastExp3
conv = tf.nn.conv2d(images, kernel, [1, 2, 2, 1], padding='VALID')
biases = _variable_on_cpu('biases', [32], tf.constant_initializer(1.0))
pre_activation = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(pre_activation, name=scope.name)
# conv2
with tf.variable_scope('conv2') as scope:
kernel = _variable_with_weight_decay('weights',
shape=[3, 3, 32, 64],
stddev=1,
wd=0.0)
kernel = tf.multiply(kernel, 0.0833) #line 321-325 in warpTrainCNNCDHDCentroidChainGridPredSharedRevFastExp3
conv = tf.nn.conv2d(conv1, kernel, [1, 2, 2, 1], padding='VALID')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(1.0))
pre_activation = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(pre_activation, name=scope.name)
...
...
more layers
...
...
return res_aux
def train(res_aux, global_step):
...
...
code here to process res_aux and calculate loss
...
...
opt = tf.train.GradientDescentOptimizer(learning_rate=0.01)
grads_and_vars = opt.compute_gradients(loss, tf.get_collection('weights'))
#printing shows real valued gradient and weight values
apply_gradients(grads_and_vars, global_step=global_step)
#printing same weight values shows no change in weight values. Gradients are not applied to the weights
答案 0 :(得分:0)
此行仅定义应用渐变的op:
a_optimizer_col_2.apply_gradients(grad_var_2, global_step=global_step)
为了应用它,您应该在会话中运行此操作,如下所示:
...
train_step = a_optimizer_col_2.apply_gradients(grad_var_2, global_step=global_step)
...
with tf.Session() as sess:
sess.run(train_step, feed_dict={...})