我正在开发一个需要GPU支持的自定义OP for tensorflow,遵循the tensorflow documentation中的指南。在我自己的代码中跟踪错误时,我回到文档中的示例并尝试编译the referenced code example:
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
__global__ void AddOneKernel(const int* in, const int N, int* out) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
i += blockDim.x * gridDim.x) {
out[i] = in[i] + 1;
}
}
void AddOneKernelLauncher(const int* in, const int N, int* out) {
AddOneKernel<<<32, 256>>>(in, N, out);
}
#endif
使用文档中建议的命令:
nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
将$TF_INC
正确替换为tensorflow包含路径。不幸的是,这会产生很多错误:
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1294): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1300): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1306): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1312): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1318): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1324): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1330): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1336): error: expression must have arithmetic, unscoped enum, or pointer type
还有更多像这样的人。
我发现这可能与不受支持的nvcc / gcc / os组合有关。我没有自己设置机器(实际上没有sudo权限)。我在Ubuntu 16.04.2上有nvcc版本7.5.17,gcc版本4.9.3。 Ubuntu 16.04.2未列在CUDA 7.5支持的系统中。这可能是一个问题,但我发现许多人声称它适用于16.04。此外,我在这台机器上成功编译了Tensorflow和GPU支持..
此外,这些错误与代码中的the Tensor #include相关,并且代码在没有此行的情况下成功编译。如果演示OP在没有这个包含的情况下工作,我没有尝试过,但我自己的OP失败了
2017-06-01 09:36:14.679685: E tensorflow/stream_executor/cuda/cuda_driver.cc:1067] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
2017-06-01 09:36:14.679777: F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
两个问题: