我有一个test.py文件
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.externals import joblib
import tqdm
import time
print("Successful import")
我已按照此方法创建所有依赖项的独立zip
pip install -t dependencies -r requirements.txt
cd dependencies
zip -r ../dependencies.zip .
创建此树结构(dependencies.zip)
dependencies.zip
->pandas
->numpy
->........
当我跑
时spark-submit --py-files /home/ion/Documents/dependencies.zip /home/ion/Documents/sentiment_analysis/test.py
我收到以下错误
2018-05-16 07:36:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/home/ion/Documents/sentiment_analysis/test.py", line 2, in <module>
from encoder import Model
File "/home/ion/Documents/sentiment_analysis/encoder.py", line 2, in <module>
import numpy as np
File "/home/ion/Documents/dependencies.zip/numpy/__init__.py", line 142, in <module>
File "/home/ion/Documents/dependencies.zip/numpy/add_newdocs.py", line 13, in <module>
File "/home/ion/Documents/dependencies.zip/numpy/lib/__init__.py", line 8, in <module>
File "/home/ion/Documents/dependencies.zip/numpy/lib/type_check.py", line 11, in <module>
File "/home/ion/Documents/dependencies.zip/numpy/core/__init__.py", line 26, in <module>
ImportError:
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control). Otherwise reinstall numpy.
Original error was: cannot import name multiarray
2018-05-16 07:36:21 INFO ShutdownHookManager:54 - Shutdown hook called
2018-05-16 07:36:21 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-a3c2ec75-6c12-4ac2-ae2c-b36412209889
有没有办法这样我可以运行这个python脚本作为spark jon而不更改pyspark中的代码或更改最少的代码?