应用错误收集

我添加了一个新的gpu内核op（MyGpuOp），在python中使用op，并通过tf.device指定设备ID（'/ gpu：0' 例如：

with tf.device('/gpu:0') :
   out1 = my_gpu_op(input1) 
with tf.device('/gpu:1') :
   out2 = my_gpu_op(input2)

然后我使用sess.run（[out1，out2]，seens将不会在不同的Gpu设备中并发运行。因为sess.run [out1，out2]的运行时间是sess.run（out1）的两倍。

在c ++ op wraps（MyGpuOp）中，我通过pass steam运行cuda内核（CudaStream_t）像这样：

    //ctx is OpKernelContext 
    GPUDevice d = ctx->eigen_device<GPUDevice>();
    CudaStream_t stream = d.stream();
    MyKernelName<<<grid, block, 0, stream>>> (....);

tensorflow添加新操作：如何在Opkernel中获取设备ID

0 个答案: