来自gpu_device.cc
// NOTE(tucker): We need to discriminate between Eigen GPU
// operations and all others. If an operation is Eigen
// implemented (or otherwise tries to launch a cuda kernel
// directly), we need to establish a stacked-scoped environment
// that directs it to execute on the proper device. Otherwise we
// expect the Op to use StreamExecutor directly and correctly. The
// way we make this discrimination is quite hacky: At the moment
// the only non-Eigen GPU Op is the recv-op, which is known to be
// asynchronous.
和gpu_device只在不同的上下文时等待。 (sync_every_op为false)
但是在argmax_op.h中,例如,
template <typename Device, typename T>
struct ArgMin {
#define DECLARE_COMPUTE_SPEC(Dims) \
EIGEN_ALWAYS_INLINE static void Reduce##Dims( \
const Device& d, typename TTypes<T, Dims>::ConstTensor input, \
const int32 dimension, \
typename TTypes<int64, Dims - 1>::Tensor output) { \
output.device(d) = input.argmin(dimension).template cast<int64>(); \
}
直接使用设备计算。这是对的吗?
答案 0 :(得分:1)
我错过了什么。 cuda流传递给eigen设备。所以没有问题