我正在开发一个DDPG实现,它需要针对另一个网络计算一个网络(下面:critic
)渐变(下面:actor
)输出。我的代码在大多数情况下已经使用了队列而不是feed dicts,但我还没有对这个特定的部分这样做:
import tensorflow as tf
tf.reset_default_graph()
states = tf.placeholder(tf.float32, (None,))
actions = tf.placeholder(tf.float32, (None,))
actor = states * 1
critic = states * 1 + actions
grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
act = sess.run(actor, {states: [1.]})
print(act) # -> [1.]
cri = sess.run(critic, {states: [1.], actions: [2.]})
print(cri) # -> [3.]
grad1 = sess.run(grads_indirect, {states: [1.], actions: act})
print(grad1) # -> [[1.]]
grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]})
print(grad2) # -> TypeError: Fetch argument has invalid type 'NoneType'
grad1
这里计算渐变w.r.t.以前由actor
计算的馈入动作。 grad2
应该做同样的事情,但是直接在图表内部,而不需要重新提供操作,而是直接评估actor
。问题是grads_direct
是None
:
print(grads_direct) # [None]
我怎样才能做到这一点?是否有专门的"评估这个张量"操作我可以利用?谢谢!
答案 0 :(得分:1)
在您的示例中,您不使用actor
来计算critic
,因此渐变为无。
你应该这样做:
actor = states * 1
critic = actor + actions # change here
grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)