无法导入SparkContext

时间:2017-03-30 19:13:27

标签: python apache-spark pyspark mapr

我正在使用CentOS,我已设置$SPARK_HOME并在bin中添加了$PATH的路径。

我可以从任何地方运行pyspark

但是当我尝试创建python文件并使用此语句时;

from pyspark import SparkConf, SparkContext

它会抛出以下错误

python pysparktask.py
    Traceback (most recent call last):
    File "pysparktask.py", line 1, in <module>
      from pyspark import SparkConf, SparkContext
    ModuleNotFoundError: No module named 'pyspark'

我尝试使用pip重新安装它。

pip install pyspark

并且它也给出了这个错误。

  

找不到满足pyspark要求的版本(来自版本:)   找不到匹配的pyspark分发

修改

根据答案,我更新了代码。

错误是

Traceback (most recent call last):
  File "pysparktask.py", line 6, in <module>
    from pyspark import SparkConf, SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
    from pyspark.context import SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
    from pyspark.java_gateway import launch_gateway
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'

2 个答案:

答案 0 :(得分:3)

添加以下环境变量,并将spark的lib路径附加到sys.path

import os
import sys

os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")

from pyspark import SparkConf, SparkContext # And then try to import SparkContext.

答案 1 :(得分:1)

pip install -e /spark-directory/python/.

此安装将解决您的问题。 你必须编辑bash_profile

export SPARK_HOME="/spark-directory"