Tensorflow:图表

时间:2017-06-08 20:23:54

标签: graph tensorflow reinforcement-learning gradients

我正在开发一个DDPG实现,它需要针对另一个网络计算一个网络(下面:critic)渐变(下面:actor)输出。我的代码在大多数情况下已经使用了队列而不是feed dicts,但我还没有对这个特定的部分这样做:

import tensorflow as tf
tf.reset_default_graph()

states = tf.placeholder(tf.float32, (None,))
actions = tf.placeholder(tf.float32, (None,))

actor = states * 1
critic = states * 1 + actions

grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    act = sess.run(actor, {states: [1.]})
    print(act)  # -> [1.]
    cri = sess.run(critic, {states: [1.], actions: [2.]})
    print(cri)  # -> [3.]
    grad1 = sess.run(grads_indirect, {states: [1.], actions: act})
    print(grad1)  # -> [[1.]]
    grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]})
    print(grad2)  # -> TypeError: Fetch argument has invalid type 'NoneType'

grad1这里计算渐变w.r.t.以前由actor计算的馈入动作。 grad2应该做同样的事情,但是直接在图表内部,而不需要重新提供操作,而是直接评估actor。问题是grads_directNone

print(grads_direct)  # [None]

我怎样才能做到这一点?是否有专门的"评估这个张量"操作我可以利用?谢谢!

1 个答案:

答案 0 :(得分:1)

在您的示例中,您不使用actor来计算critic,因此渐变为无。

你应该这样做:

actor = states * 1
critic = actor + actions  # change here

grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)