Question

我正在设计自定义内核。 execusion运行正常（通过CUDA-printfs检查），只有当我尝试再次在TF-Space中打印相同的值时，才能获得SegFault。对我而言，数据似乎从未从GPU转移到CPU，我不确定这是否已完成或如何触发TF来执行此操作。

以下是我如何调用值

Tensor* access_output = new Tensor(DT_FLOAT,TensorShape({250,250}));
PersistentTensor* per_tensor = new PersistentTensor(*proc_tensor);
OP_REQUIRES_OK(context, context->allocate_persistent(DT_FLOAT,access_tensor->shape(),per_tensor,&access_output));   
CallGPUKernel()(context->eigen_device<GPUDevice>(),input_tensor.flat<int64>().data(),access_output->flat<float>().data());

#It crashes if I use
auto output =access_output->tensor<float,2>();
printf("%f\n",output(0,0)); # crash

#But also if I use
float* output_ptr = (float*) malloc(sizeof(float)*250*250));
output_ptr = access_output->flat<float>().data();
printf("%f\n", output_ptr[0]); #crash

将allocate_persistent更改为allocate_temp或allocate_output（分别使用调整后的设置）并没有什么不同。

有谁知道如何将内核中分配的数据复制回access_output以便在CPU空间中访问？

访问Tensorflow自定义GPU操作的输出时出现分段错误

0 个答案: