Question

I am writing an Op in C++ and CUDA for TensorFlow that has shared custom function code. Usually when code sharing between CPU and CUDA implementations, one would define a macro to insert the __device__ specifier into the function signature, if compiling for CUDA. Is there an inbuilt way to share code in this manner in TensorFlow?

How does one define utility functions(usually inlined) that can run on the CPU and GPU?

Answer 1

事实证明，TensorFlow中的以下宏将执行我所描述的内容。

namespace tensorflow{
    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
    void foo() {
        //
    }
}

TensorFlow CPU and CUDA code sharing

1 个答案: