我有一个在ubuntu 16.04中运行良好的tensorflow应用程序但是当我尝试在具有nvidia-docker的tensorflow / tensorflow docker镜像中运行它时,它会到达这一点然后冻结:
2017-07-12 22:06:10.917255: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use SSE4.1 instructions, but these are
available on your machine and could speed up CPU computations.
2017-07-12 22:06:10.917289: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use SSE4.2 instructions, but these are
available on your machine and could speed up CPU computations.
2017-07-12 22:06:11.023765: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful
NUMA node read from SysFS had negative value (-1), but there must be
at least one NUMA node, so returning NUMA node zero
2017-07-12 22:06:11.024133: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0
with properties:
name: Quadro M4000
major: 5 minor: 2 memoryClockRate (GHz) 0.7725
pciBusID 0000:00:05.0
Total memory: 7.93GiB
Free memory: 7.87GiB
2017-07-12 22:06:11.024159: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-07-12 22:06:11.024168: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-07-12 22:06:11.024190: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M4000, pci
bus id: 0000:00:05.0)
由于它没有输出错误信息,我不知道从哪里开始;对于我可能遗漏的任何建议或进一步排除故障的步骤?
我确认我的nvidia-docker安装工作正常。
答案 0 :(得分:1)
事实证明应用程序正在运行,它似乎已经冻结,因为在docker容器中运行的python应用程序的输出往往会卡在缓冲区中,并且永远不会显示在docker日志中。为了解决这个问题,我将-u
传递给了python - 我现在可以在docker日志中看到我的应用程序输出,一切都很顺利。