Question

我只是按照the instructions但是在尝试读取GPU上的/的输入值时，我总是遇到段错误。如果我在CPU上执行相同的代码（然后使用不同的REGISTER_KERNEL_BUILDER），它将按预期工作。遗憾的是gdb的回溯并没有给我更多信息，即使我用bazel的调试标志构建自定义操作。

这是我的代码

Interface.cc

REGISTER_OP("Interface")
    .Input("pointer_to_grid: int32")
    .Output("current_grid_data: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    shape_inference::ShapeHandle input_shape;
    TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &input_shape)); // allow only a 1D pointer address stored in an integer    
    return Status::OK();
    });

class InterfaceGPU : public OpKernel {
 public:
  explicit InterfaceGPU(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {
    // Grab the input tensor
    const Tensor& input_tensor = context->input(0);
    const auto input = input_tensor.flat<int32>();

    printf("This works %d \n", input);
    printf("This does not %d \n", input(0)); //Segementation fault is here 

    //...

  }
};

REGISTER_KERNEL_BUILDER(Name("GridPointerInterface").Device(DEVICE_GPU), InterfaceGPU);

runme.py

import tensorflow as tf
import numpy as np
import sys
op_interface = tf.load_op_library('~/tensorflow/bazel-bin/tensorflow/core/user_ops/interface.so')
with tf.device("/gpu:0"):
  with tf.Session() as sess:
    sess.run(op_interface.interface_gpu(12))

我用TF 1.6＆amp;测试了它1.7。在我看来TF正在跳过内存分配，不幸的是我不知道如何强迫这个。

感谢您的任何建议

Answer 1

这是预料之中的，因为您尝试从CPU访问存储在GPU上的值（因此您可以打印它）。

在GPU上操纵值的方法是通过本征。如果你看看tensorflow中其他内核的实现，你会看到{{1}}之类的代码。这告诉eigen为你创建一个cuda内核。

如果你想直接操作GPU上的值，你需要同步GPU流并将它们复制到CPU内存，这相当复杂。

访问自定义操作

1 个答案: