构建TensorFlow:从Bazel根外部导入MPI标头

时间:2019-06-19 14:18:32

标签: tensorflow build bazel openmpi

我想在Ubuntu 16.04上构建TensorFlow 1.3(而不是1.13),并支持MPI(而不是默认的gRPC)。我从Ubuntu仓库安装了软件包libopenmpi-dev。运行/usr/lib/openmpi脚本时,我已经提供了configure作为MPI工具包目录。

我使用此命令启动构建:

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

但是存在标头包含问题:

  1. 文件tensorflow/contrib/mpi/mpi_utils.cc包含tensorflow/contrib/mpi/mpi_utils.h
  2. mpi_utils.h包括third_party/mpi/mpi.h
  3. mpi.h是指向/usr/lib/openmpi/include/mpi.h的符号链接
  4. 实际的mpi.h包含以下行:
#include "openmpi/ompi/mpi/cxx/mpicxx.h"
  1. 并且mpicxx.h位于文件夹/usr/lib/openmpi/include/openmpi/ompi/mpi/cxx/中,该文件夹不在包含路径中。

我已通过创建指向正确文件夹的符号链接来“修复”此问题:

$ ln -s /usr/lib/openmpi/include/openmpi third_party/mpi/openmpi

现在找到了mpicxx.h,但是它想包含mpi.h,由于mpi.h不在同一个文件夹中,因此失败了:

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
WARNING: /home/arno/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:exporter': No longer supported. Switch to SavedModel immediately.
WARNING: /home/arno/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:gc': No longer supported. Switch to SavedModel immediately.
INFO: Found 1 target...
ERROR: /home/arno/tensorflow/tensorflow/contrib/mpi/BUILD:60:1: C++ compilation of rule '//tensorflow/contrib/mpi:mpi_rendezvous_mgr' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 151 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from ./third_party/mpi/mpi.h:2673:0,
                 from ./tensorflow/contrib/mpi/mpi_utils.h:27,
                 from ./tensorflow/contrib/mpi/mpi_rendezvous_mgr.h:33,
                 from tensorflow/contrib/mpi/mpi_rendezvous_mgr.cc:18:
./third_party/mpi/openmpi/ompi/mpi/cxx/mpicxx.h:35:17: fatal error: mpi.h: No such file or directory
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/arno/tensorflow/tensorflow/tools/pip_package/BUILD:134:1 C++ compilation of rule '//tensorflow/contrib/mpi:mpi_rendezvous_mgr' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 151 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
INFO: Elapsed time: 6.668s, Critical Path: 4.98s

我尝试使用以下命令将标头的路径手动添加到GCC的包含路径:

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --copt='-I/usr/lib/openmpi/include'

...但是随后出现错误,因为/usr/lib/openmpi/include/openmpi/ompi/mpi/cxx中包含的标头未在Bazel的配置文件中声明。而且我无法将它们声明给Bazel,因为它不接受绝对路径。

我找不到使此构建工作正常的方法。我是Bazel的新手,根据我的阅读,构建目录应该是“独立的”,即包含所有必需的头文件和源文件,但是TensorFlow存储库通过向{{1}添加符号链接来违反此规定。 /usr/lib/...中的}。不能更改TensorFlow版本。

如何通过OpenMPI支持构建TensorFlow 1.3?

编辑:将third_party/mpi选项添加到Bazel build命令中,如注释中所建议的那样,将提供更详细的输出,但是我无法确定使用哪个编译器。我认为这些是相关的行:

-s

1 个答案:

答案 0 :(得分:1)

一种解决方法是从源代码构建和安装MVAPICH(然后,MPI工具箱路径为/usr/local)。该问题仅存在于OpenMPI中。