我的计算图中有一个张量,我想在每个列车步骤之后添加一行。我怎么能做到这一点?
更多细节:我正在抓取optimizer.compute_gradients
的渐变,我想根据渐变历史来修改这些渐变。这是我正在尝试使用的代码:
def process_gradient(gradient, optimizer, name):
reshaped_gradient = flatten(gradient)
if gradient.name in optimizer._slots:
optimizer._slots[gradient.name] += [reshaped_gradient]
else:
optimizer._slots[gradient.name] = [reshaped_gradient]
# each
gradients_over_time = tf.stack(optimizer._slots[gradient.name])
print('gradients_over_time.get_shape()', gradients_over_time.get_shape())
return gradient
...
grads_and_vars = optimizer.compute_gradients(cost_function)
train_step = optimizer.apply_gradients([(process_gradient(grad, optimizer, str(i)), var) for i, (grad, var) in enumerate(grads_and_vars)])
我还尝试保留一个变量,用于通过连接新行来跟踪行,但这不起作用。
答案 0 :(得分:2)
我最终使用tf.py_func
来完成此任务。我在Python函数中访问的全局列表中跟踪状态。这里应用了渐变:
# process each individual gradient before applying it
train_step = optimizer.apply_gradients([(process_gradient(grad, str(i)), var) for i, (grad, var) in enumerate(grads_and_vars)])
这是我随时间跟踪状态的地方,并将使用构建状态:
def construct_processor(name):
global_gradients_over_time = {}
def python_process_gradient(gradient):
reshaped_gradient = gradient.flatten()
if name in global_gradients_over_time:
global_gradients_over_time[name].append(reshaped_gradient)
else:
global_gradients_over_time[name] = [reshaped_gradient]
# process gradients somehow
return gradient
return python_process_gradient
def process_gradient(gradient, name):
return tf.py_func(construct_processor(name), [gradient], tf.float32)
construct_processor
允许您一次处理一个渐变,为每组渐变命名,以便我可以在全局词典中找到它们。我认为这种方法也可以使内存远离GPU。
答案 1 :(得分:1)
这是使用持久性Tensors存储渐变历史的示例。在下面的循环中,gradient_history
指的是到目前为止所有渐变的连接:
n = 2
x = tf.Variable(tf.zeros((n,)))
x_target = 10*tf.ones((n,))
loss = tf.reduce_sum(tf.square(x - x_target))
optimizer = tf.train.GradientDescentOptimizer(0.1)
gradient = tf.gradients(loss, [x])[0]
train_op = optimizer.apply_gradients([[gradient, x]])
# initialize history with first gradient
sess = tf.Session()
sess.run(tf.global_variables_initializer())
gradient_history0 = sess.run(tf.get_session_handle(tf.stack([gradient])))
previous_gradients_in, previous_gradients = tf.get_session_tensor(gradient_history0, dtype=dtype)
gradient_history = tf.concat(0, [previous_gradients, [gradient]])
gradient_history_out = tf.get_session_handle(gradient_history)
sess.run(tf.global_variables_initializer())
for i in range(10):
[gradient_history0, _, loss0, gradient0] = sess.run([gradient_history_out, train_op, loss, gradient],
feed_dict={previous_gradients_in: gradient_history0.handle})
print(loss0, gradient0)
当你运行它时,你会看到类似这样的东西:
200.0 [-20. -20.]
128.0 [-16. -16.]
81.92 [-12.80000019 -12.80000019]
52.4288 [-10.23999977 -10.23999977]
33.5544 [-8.19199944 -8.19199944]
21.4748 [-6.55359936 -6.55359936]
13.7439 [-5.24287987 -5.24287987]
8.79609 [-4.19430351 -4.19430351]
5.6295 [-3.35544205 -3.35544205]
3.60288 [-2.68435287 -2.68435287]
请注意,在计算的每个馈送步骤中,gradient_history
是Tensor对象,它引用渐变历史记录。同时gradient_history0
是TensorHandle对象,它引用在session.run
次调用之间保留的已保存历史记录。您可以使用feed_dict={...: gradient_history0.handle}
将该值反馈回图表,但与提供numpy数组不同,您正在输入数据的“指针”,并且数据本身永远不会离开TensorFlow运行时。由于句柄在session.run
次调用之间持续存在,您还可以直接对其进行评估:
In [10]: gradient_history0.eval()
Out[10]:
array([[-20. , -20. ],
[-20. , -20. ],
[-16. , -16. ],
[-12.80000019, -12.80000019],
[-10.23999977, -10.23999977],
[ -8.19199944, -8.19199944],
[ -6.55359936, -6.55359936],
[ -5.24287987, -5.24287987],
[ -4.19430351, -4.19430351],
[ -3.35544205, -3.35544205],
[ -2.68435287, -2.68435287]], dtype=float32)