Tensorflow使用CUDA 9为新的p3实例进行编译

时间:2017-10-28 17:49:01

标签: amazon-ec2 tensorflow tensorflow-serving

我能够从亚马逊的修改源(在新的深度学习AMI中提供)中重新编译Tensorflow。

我现在正在尝试使用Tensorflow" fork"来编译服务。但是我收到了这个错误:

<td>

更多信息:我使用Tensorflow服务的主分支(提交ERROR: /root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:68:1: undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/contrib/nccl:nccl_kernels': this rule is missing dependency declarations for the following files included by 'external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_rewrite.cc': '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/optimization_registry.h' '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/device_set.h' '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/device.h' '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/types.h' '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/costmodel.h' '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/node_builder.h' INFO: Elapsed time: 20.377s, Critical Path: 19.47s FAILED: Build did NOT complete successfully )和Bazel发布0.7.0。

我还对7a349752c2cbbe741edb91c6c6be1c571e91a5fb进行了一些小改动,以解决另一个编译错误:

tools/bazel.rc

知道缺少什么吗?

1 个答案:

答案 0 :(得分:1)

我通常禁用NCCL,因为它似乎永远不会正确构建:

https://github.com/PipelineAI/pipeline/blob/6261c4f31105e40ab8b24ccc7834f9181f4e5aaf/package/tensorflow/16d39e9-d690fdd/Dockerfile.full-gpu#L160

RUN \
  cd $TENSORFLOW_SERVING_HOME \
  # Remove NCCL since it isn't building properly
  && sed -i.bak '/nccl/d' tensorflow/tensorflow/contrib/BUILD \
  && bazel build -c opt --config=cuda \
      --verbose_failures \
      --spawn_strategy=standalone --genrule_strategy=standalone \
      --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 \
      --crosstool_top=@local_config_cuda//crosstool:toolchain \
       tensorflow_serving/... \
  && chmod a+x bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
  && cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/ \
  && bazel clean --expunge