Question

来自gpu_device.cc

// NOTE(tucker): We need to discriminate between Eigen GPU // operations and all others. If an operation is Eigen // implemented (or otherwise tries to launch a cuda kernel // directly), we need to establish a stacked-scoped environment // that directs it to execute on the proper device. Otherwise we // expect the Op to use StreamExecutor directly and correctly. The // way we make this discrimination is quite hacky: At the moment // the only non-Eigen GPU Op is the recv-op, which is known to be // asynchronous. 和gpu_device只在不同的上下文时等待。（sync_every_op为false）

但是在argmax_op.h中，例如，

template <typename Device, typename T>
struct ArgMin {
#define DECLARE_COMPUTE_SPEC(Dims)                                     \
EIGEN_ALWAYS_INLINE static void Reduce##Dims(                        \
const Device& d, typename TTypes<T, Dims>::ConstTensor input,    \
const int32 dimension,                                           \
typename TTypes<int64, Dims - 1>::Tensor output) {               \
output.device(d) = input.argmin(dimension).template cast<int64>(); \
}

直接使用设备计算。这是对的吗？

Answer 1

我错过了什么。 cuda流传递给eigen设备。所以没有问题

一些张流GPU OpKernel通过本征设备计算而没有流同步，是那辆车吗？

1 个答案: