Pyspark启动问题Windows 10 wxith Python 3.6

时间:2017-03-17 15:10:57

标签: python pyspark

使用Anaconda安装Python 3.x后,我无法在Windows中启动Pyspark。 低于错误 -

Python 3.6.0 |Anaconda 4.3.0 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "C:\Users\prudra\Desktop\Udemy\spark-2.1.0-bin-hadoop2.7\bin\..\python\pyspark\shell.py", line 30, in <module>
    import pyspark
  File "C:\Users\prudra\Desktop\Udemy\spark-2.1.0-bin-hadoop2.7\python\pyspark\__init__.py", line 44, in <module>
    from pyspark.context import SparkContext
  File "C:\Users\prudra\Desktop\Udemy\spark-2.1.0-bin-hadoop2.7\python\pyspark\context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File "C:\Users\prudra\Desktop\Udemy\spark-2.1.0-bin-hadoop2.7\python\pyspark\java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File "C:\Users\prudra\Desktop\Udemy\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 18, in <module>
  File "C:\Users\prudra\AppData\Local\Continuum\Anaconda3\lib\pydoc.py", line 62, in <module>
    import pkgutil
  File "C:\Users\prudra\AppData\Local\Continuum\Anaconda3\lib\pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File "C:\Users\prudra\Desktop\Udemy\spark-2.1.0-bin-hadoop2.7\python\pyspark\serializers.py", line 393, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'

请告诉我如何解决

2 个答案:

答案 0 :(得分:0)

PySpark 2.1目前不适用于Python 3.6.0。已报告此问题here。它已在2017年1月17日得到解决,但截至今天(2017年3月17日)尚未发布。但是,查看提交的更改,您应该能够通过下载以下两个Python文件来自行解决此问题:

https://github.com/apache/spark/blob/master/python/pyspark/serializers.py https://github.com/apache/spark/blob/master/python/pyspark/cloudpickle.py

并将它们保存到以下位置(覆盖现有文件):

C:\用户\ prudra \桌面\ Udemy \火花2.1.0彬hadoop2.7 \蟒\ pyspark

或更通用的文件应保存到Spark安装的python\pyspark子文件夹中。

答案 1 :(得分:0)

Spark 2.1.1刚刚于5月4日发布。它现在正在使用Python 3.6,您可以看到发行说明here