我安装了Spark,运行了sbt程序集,并且可以毫无问题地打开bin / pyspark。但是,我遇到了将pyspark模块加载到ipython中的问题。我收到以下错误:
In [1]: import pyspark
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-c15ae3402d12> in <module>()
----> 1 import pyspark
/usr/local/spark/python/pyspark/__init__.py in <module>()
61
62 from pyspark.conf import SparkConf
---> 63 from pyspark.context import SparkContext
64 from pyspark.sql import SQLContext
65 from pyspark.rdd import RDD
/usr/local/spark/python/pyspark/context.py in <module>()
28 from pyspark.conf import SparkConf
29 from pyspark.files import SparkFiles
---> 30 from pyspark.java_gateway import launch_gateway
31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, \
32 PairDeserializer, CompressedSerializer
/usr/local/spark/python/pyspark/java_gateway.py in <module>()
24 from subprocess import Popen, PIPE
25 from threading import Thread
---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient
27
28
ImportError: No module named py4j.java_gateway
答案 0 :(得分:63)
在我的环境中(使用docker和图像sequenceiq / spark:1.1.0-ubuntu),我跑到了这里。如果你看一下pyspark shell脚本,你会发现你需要为PYTHONPATH添加一些东西:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
我在ipython中工作。
更新:如评论中所述,py4j zip文件的名称随每个Spark版本而变化,因此请寻找正确的名称。
答案 1 :(得分:23)
我通过在.bashrc
中添加一些路径解决了这个问题export SPARK_HOME=/home/a141890/apps/spark
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
在此之后,它永远不会引发ImportError:没有名为py4j.java_gateway的模块。
答案 2 :(得分:4)
在Pycharm, 在运行上面的脚本之前,请确保已解压缩py4j * .zip文件。 并在脚本中添加其引用 sys.path.append(&#34; spark * / python / lib&#34的路径;)
它对我有用。
答案 3 :(得分:4)
#/home/shubham/spark-1.6.2
import os
import sys
# Set the path for spark installation
# this is the path where you have built spark using sbt/sbt assembly
os.environ['SPARK_HOME'] = "/home/shubham/spark-1.6.2"
# os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1"
# Append to PYTHONPATH so that pyspark could be found
sys.path.append("/home/shubham/spark-1.6.2/python")
sys.path.append("/home/shubham/spark-1.6.2/python/lib")
# sys.path.append("/home/jie/d2/spark-0.9.1/python")
# Now we are ready to import Spark Modules
try:
from pyspark import SparkContext
from pyspark import SparkConf`enter code here`
print "Hey nice"
except ImportError as e:
print ("Error importing Spark Modules", e)
sys.exit(1)
答案 4 :(得分:4)
安装pip模块&#39; py4j&#39;。
pip install py4j
我在Spark 2.1.1和Python 2.7.x中遇到了这个问题。不确定Spark是否停止在最新发行版中捆绑此软件包。但是安装py4j
模块为我解决了这个问题。