Question

我对tensorflow用于为CPU或GPU分配不同Ops的机制感到困惑。

以下面的伪代码为例。我们可以说：只要canvas.forEachObject(function(obj){ obj.animateWidthHeight(); obj.dirty = true; });是它将在SimpleOp的上下文中创建肯定在GPU上运行（假设with tf.device('/gpu:0')的GPU实现可用），无论其输入变量（SimpleOp和in_1）是在CPU或GPU上创建？
```
in_2
```
我通过创建with tf.device('/gpu:0'): out = tf.SimpleOp(in_1, in_2, name='Simple')来理解 session，tensorflow输出设备所有变量/ Ops的展示位置。但是，是否有方法允许我只检查一个Op的设备分配？

提前致谢！

Answer 1

TLDR;在with tf.device("/gpu:0")中创建的操作将始终在GPU上运行。如果指定要放在cpu上的输入，那么它们将被放置在CPU上。如果省略输入的设备规格，它们将被放置在GPU上以更接近您的操作。您可以使用run_metadata获取包含所有设备分配的Python对象，并在那里查找您的操作。

通过误导性地命名simple_placer.cc完成放置，虽然注释指定了机制，但仍有一些错误被删除（即here），所以最好的方法是检查它实践。

当你说在GPU上创建变量时，实际上有两种放置 - 显式，当你在with tf.device块内部创建相关的op时，以及隐式的，在这种块之外。在with tf.device之外创建操作相当于在with tf.device(None)块中创建操作。

所以这是一个简单的实验

n = 10**6
def inputs_cpu():
    tf.reset_default_graph()
    with tf.device("/cpu:0"):
        a = tf.ones((n,), name="A")
        b = tf.ones((n,), name="B")
    with tf.device("/gpu:0"):
        c = tf.add(a, b, name="C")
    return c

def inputs_none():
    tf.reset_default_graph()
    a = tf.ones((n,), name="A")
    b = tf.ones((n,), name="B")
    with tf.device("/gpu:0"):
        c = tf.add(a, b, name="C")
    return c

def run_and_summarize(target):
    # turn off graph-rewriting optimizations
    sess = tf.Session(config=tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))))
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    sess.run(target, options=run_options, run_metadata=run_metadata)

    for device in run_metadata.step_stats.dev_stats:
        device_name = device.device
        if not (device_name.endswith("/cpu:0") or device_name.endswith("/gpu:0")):
            continue
        print(device.device)
        for node in device.node_stats:
            print("   ", node.node_name)

现在你可以这样做

run_and_summarize(inputs_cpu())

通过固定到CPU的输入运行，您将看到此展示位置受到尊重

/job:localhost/replica:0/task:0/gpu:0
    _SOURCE
    C
/job:localhost/replica:0/task:0/cpu:0
    _SOURCE
    A
    B

另一方面，未指定输入时

run_and_summarize(inputs_none())

您可以看到现在所有操作都放在GPU上

/job:localhost/replica:0/task:0/cpu:0
    _SOURCE
/job:localhost/replica:0/task:0/gpu:0
    _SOURCE
    A
    B
    C

Answer 2

是。实际上，如果没有可用于指定设备的内核，它将失败。但是应该考虑两件事：
- 如果会话为config，则可以使用allow_soft_placement=True覆盖此内容。
- tf.device上下文管理器可以嵌套，因此如果SimpleOp不是那么简单，它可能会用with tf.device("/cpu:0"):
我不知道（欢迎提出意见）。如果您使用* nix，grep，则始终可以python script.py | grep your_op_name脚本输出。缺点是你需要重新运行你的脚本两次：首先运行log_device_placement=True和grep，然后再运行它们。

tensorflow如何分配Ops在GPU上运行？

2 个答案: