Question

我试图在A3C code

中运行A3C强化学习算法的开放代码来学习A3C

然而，我遇到了几个错误，我可以解决除了一个错误。在代码中，ref()是tf.Variable的成员函数（1，2），但在最近的tensorflow版本0.12rc中，该函数似乎已被弃用。所以我不知道替换它的最佳方法是什么（我不明白为什么作者使用ref()）。当我刚将其更改为变量本身时（例如v.ref()到v），没有错误，但奖励没有改变。它似乎无法学习，我想这是因为变量没有正确更新。

请告诉我修改代码的正确方法是什么。

Answer 1

新方法tf.Variable.read_value()取代了TensorFlow 0.12及更高版本中的tf.Variable.ref()。

这种方法的用例有点难以解释，并且受到一些缓存行为的激励，这种行为会导致在不同设备上多次使用远程变量来使用缓存值。假设您有以下代码：

with tf.device("/cpu:0")
  v = tf.Variable([[1.]])

with tf.device("/gpu:0")
  # The value of `v` will be captured at this point and cached until `m2`
  # is computed.
  m1 = tf.matmul(v, ...)

with tf.control_dependencies([m1])
  # The assign happens (on the GPU) after `m1`, but before `m2` is computed.
  assign_op = v.assign([[2.]])

with tf.control_dependencies([assign_op]):
  with tf.device("/gpu:0"):
    # The initially read value of `v` (i.e. [[1.]]) will be used here,
    # even though `m2` is computed after the assign.
    m2 = tf.matmul(v, ...)

sess.run(m2)

您可以使用tf.Variable.read_value()强制TensorFlow稍后再次读取变量，它将受制于任何控制依赖项。因此，如果您想在计算m2时查看赋值结果，则可以按如下方式修改程序的最后一个块：

with tf.control_dependencies([assign_op]):
  with tf.device("/gpu:0"):
    # The `read_value()` call will cause TensorFlow to transfer the
    # new value of `v` from the CPU to the GPU before computing `m2`.
    m2 = tf.matmul(v.read_value(), ...)

（请注意，目前，如果所有操作都在同一台设备上，您就不会需要使用read_value()，因为TensorFlow不会复制当它被用作同一设备上的op的输入时变量。这会引起很多混淆 - 例如当你将变量排入队列时！ - 这是我们正在努力增强的一个原因变量的记忆模型。）

Tensorflow版本0.12中tf.Variable.ref（）的替代方法是什么？

1 个答案: