构建如下以执行Tensorflow 1.5后端的调试:
bazel build -c opt --config cuda -c dbg --strip=never //tensorflow/tools/pip_package:build_pip_package -s
给出:
INFO: From Compiling tensorflow/contrib/nccl/kernels/nccl_manager.cc:
/usr/include/c++/6/bits/stl_pair.h(327): error: calling a __host__ function("std::_Rb_tree_const_iterator< ::tensorflow::NcclManager::NcclStream *> ::_Rb_tree_const_iterator") from a __device__ function("std::pair< ::std::_Rb_tree_const_iterator< ::tensorflow::NcclManager::NcclStream *> , bool> ::pair< ::std::_Rb_tree_iterator< ::tensorflow::NcclManager::NcclStream *> &, bool &, (bool)1> ") is not allowed
/usr/include/c++/6/bits/stl_pair.h(327): error: identifier "std::_Rb_tree_const_iterator< ::tensorflow::NcclManager::NcclStream *> ::_Rb_tree_const_iterator" is undefined in device code
/usr/include/c++/6/bits/stl_algobase.h(1009): error: calling a __host__ function("__builtin_clzl") from a __device__ function("std::__lg") is not allowed
我想我们可以通过添加一些内联标志来正确折叠这些函数,但不保留其他优化吗?