Question

我在EC2实例上运行AWS Machine Learning AMI。我已经确认，从终端，python和jupyter都可以运行

import tensorflow as tf

与

一起

python pytest.py

从终端

（包含上面的tensorflow导入），没有问题。

我现在正在尝试使用DataPipeline和TaskRunner自动化我的脚本。 DataPipeline中的bash命令又是：

python pytest.py

但是，立即收到以下错误：

回溯（最近一次呼叫最后一次）：文件＆＃34; pytest.py＆＃34;，第1行，在          导入tensorflow为tf文件＆＃34; /usr/lib/python2.7/dist-packages/tensorflow/ init .py＆＃34;，第24行，in          来自tensorflow.python import * File＆＃34; /usr/lib/python2.7/dist-packages/tensorflow/python/ init .py＆＃34;，line   72，在       raise ImportError（msg）ImportError：Traceback（最近一次调用最后一次）：File   ＆＃34; /usr/lib/python2.7/dist-packages/tensorflow/python/ init .py＆＃34;，line   61，在       来自tensorflow.python import pywrap_tensorflow文件＆＃34; /usr/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py"，   第28行，在       _pywrap_tensorflow = swig_import_helper（）File＆＃34; /usr/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py"，   第24行，在swig_import_helper中       _mod = imp.load_module（＆＃39; _pywrap_tensorflow＆＃39;，fp，pathname，description）ImportError：libcudart.so.7.5：无法打开共享对象   file：没有这样的文件或目录

无法加载本机TensorFlow运行时。

请参阅   https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

出于一些常见原因和解决方案。包括整个堆栈跟踪   请求帮助时出现此错误消息。

似乎AWS DataPipeline（或TaskRunner？）使用了不同的环境设置，因为我再次通过ssh终端向该实例运行脚本没有问题。我发现了一些帖子，建议在LD_LIBRARY_PATH中添加cuda，但是AMI实例已经有了它：

echo $LD_LIBRARY_PATH 
/home/ec2-user/src/torch/install/lib:/home/ec2-user/src/cntk/bindings/python/cntk/libs:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/home/ec2-user/src/mxnet/mklml_lnx_2017.0.1.20161005/lib:

显然包含tensorflow需要的cuda库路径。

AWS DataPipeline Maching学习AMI张量流问题

0 个答案: