Python 2.7,Apache Spark 2.1.0,Ubuntu 14.04 在pyspark shell中我收到以下错误:
>>> from pyspark.mllib.stat import Statistics Traceback (most recent call last): File "", line 1, in ImportError: No module named stat
解决方案?
类似地
>>> from pyspark.mllib.linalg import SparseVector Traceback (most recent call last): File "", line 1, in ImportError: No module named linalg
我安装了numpy并且
>>> sys.path ['', u'/tmp/spark-2d5ea25c-e2e7-490a-b5be-815e320cdee0/userFiles-2f177853-e261-46f9-97e5-01ac8b7c4987', '/usr/local/lib/python2.7/dist-packages/setuptools-18.1-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/pyspark-2.1.0+hadoop2.7-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/py4j-0.10.4-py2.7.egg', '/home/d066537/spark/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip', '/home/d066537/spark/spark-2.1.0-bin-hadoop2.7/python', '/home/d066537', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client']
答案 0 :(得分:1)
删除pyspark安装 sudo -H pip uninstall pyspark
答案 1 :(得分:0)
我有同样的问题。 Python文件stat.py
似乎不在Spark 2.1.x中,而是在Spark 2.2.x中。因此,您似乎需要使用更新的pyspark升级Spark(但Zeppelin 0.7.x似乎不适用于Spark 2.2.x)。