我已经提到this tut开始使用windows上的pyspark。这些是我遵循的步骤:
%SPARK_HOME%
的目录%SPARK_HOME%\bin
%HADOOP_HOME%
设置为与%SPARK_HOME%
%PYSPARK_DRIVER_PYTHON%
设为ipython %PYSPARK_DRIVER_PYTHON_OPTS%
设置为笔记本;%SPARK_HOME%\bin
添加到%PATH%
但是当我跑步时
> pyspark --master local[2]
我收到以下错误:
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
Traceback (most recent call last):
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\mahesh\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\Scripts\ipython.exe\__main__.py", line 9, in <module>
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\IPython\__init__.py", line 125, in start_ipython
return launch_new_instance(argv=argv, **kwargs)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\traitlets\config\application.py", line 657, in launch_instance
app.initialize(argv)
File "<decorator-gen-113>", line 2, in initialize
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\traitlets\config\application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\IPython\terminal\ipapp.py", line 308, in initialize
super(TerminalIPythonApp, self).initialize(argv)
File "<decorator-gen-7>", line 2, in initialize
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\traitlets\config\application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\IPython\core\application.py", line 450, in initialize
self.parse_command_line(argv)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\IPython\terminal\ipapp.py", line 303, in parse_command_line
return super(TerminalIPythonApp, self).parse_command_line(argv)
File "<decorator-gen-4>", line 2, in parse_command_line
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\traitlets\config\application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\traitlets\config\application.py", line 514, in parse_command_line
return self.initialize_subcommand(subc, subargv)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\IPython\core\application.py", line 243, in initialize_subcommand
return super(BaseIPythonApplication, self).initialize_subcommand(subc, argv)
File "<decorator-gen-3>", line 2, in initialize_subcommand
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\traitlets\config\application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\traitlets\config\application.py", line 445, in initialize_subcommand
subapp = import_item(subapp)
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\ipython_genutils\importstring.py", line 31, in import_item
module = __import__(package, fromlist=[obj])
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\notebook\notebookapp.py", line 31, in <module>
from zmq.eventloop import ioloop
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\zmq\eventloop\__init__.py", line 3, in <module>
from zmq.eventloop.ioloop import IOLoop
File "d:\mahesh\softwares\python\winpython-64bit-3.4.4.4qt5\python-3.4.4.amd64\lib\site-packages\zmq\eventloop\ioloop.py", line 21, in <module>
from zmq import (
ImportError: cannot import name 'Poller'
我能够使用>spark-shell
命令正确运行spark scala shell。
正如你在堆栈跟踪中看到的那样,我已经在路径
安装了win-pythonD:\mahesh\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64
因此,我的%PYTHON_HOME%
是D:\mahesh\Softwares\python\WinPython-64bit-3.4.4.4Qt5
。
但我的%SPARK_HOME%
是D:\mahesh\Programs\spark-2.3.0-bin-hadoop2.7
。
运行where pyspark
命令提供以下输出:
D:\mahesh\Programs\spark-2.3.0-bin-hadoop2.7\bin\pyspark
D:\mahesh\Programs\spark-2.3.0-bin-hadoop2.7\bin\pyspark.cmd
D:\mahesh\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\Scripts\pyspark
D:\mahesh\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\Scripts\pyspark.cmd
我相信我的问题是我的Windows Spark环境有些错误配置。这就是我给出上述所有信息的原因。那么这里出了什么问题?
请注意,我按照tut中的建议不使用Anaconda和GOW(Windows上的Gnu)执行了这些步骤。
答案 0 :(得分:0)
将您的%PYSPARK_DRIVER_PYTHON%
指向包含'Poller'
所有依赖项的虚拟环境,然后进行检查。
否则你可以尝试在ipython环境中安装'Poller'
(我坦率地不知道怎么做!)