如何在Windows 10中为Jupyter笔记本安装和配置Apache Toree?

时间:2017-12-19 18:08:50

标签: apache-spark windows-10 jupyter-notebook apache-toree

有人可以帮我安装和配置Windows 10中的jupyter笔记本的apache toree吗?我试了但是不成功。 遇到的错误如下。

无法启动内核

未知的服务器错误。

     Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\notebook\base\handlers.py", line 516, in wrapper
    result = yield gen.maybe_future(method(self, *args, **kwargs))
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Anaconda3\lib\site-packages\notebook\services\sessions\handlers.py", line 75, in post
    type=mtype))
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Anaconda3\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 79, in create_session
    kernel_id = yield self.start_kernel_for_session(session_id, path, name, type, kernel_name)
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Anaconda3\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 92, in start_kernel_for_session
    self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 307, in wrapper
    yielded = next(result)
  File "C:\Anaconda3\lib\site-packages\notebook\services\kernels\kernelmanager.py", line 94, in start_kernel
    super(MappingKernelManager, self).start_kernel(**kwargs)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\multikernelmanager.py", line 110, in start_kernel
    km.start_kernel(**kwargs)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\manager.py", line 243, in start_kernel
    **kw)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\manager.py", line 189, in _launch_kernel
    return launch_kernel(kernel_cmd, **kw)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\launcher.py", line 123, in launch_kernel
    proc = Popen(cmd, **kwargs)
  File "C:\Anaconda3\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Anaconda3\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application

2 个答案:

答案 0 :(得分:0)

Apache Toree使用lst[3]启动内核。

在Windows上,%PROG_HOME%\bin\run.sh通常是PROG_HOME

由于Windows无法运行Shell脚本,因此会引发OS错误:

[WinError 193]%1不是有效的Win32应用程序。

您需要执行以下步骤:

  1. 下载与Scala 2.11兼容的Spark版本并设置SPARK_HOME环境变量。请注意,Apache Toree内核版本0.3.0-incubating使用Scala版本2.11。

  2. C:\Users\{Account_Name}\AppData\Roaming\jupyter\kernels\apache_toree_scala上创建Windows批处理文件( run.bat )或Windows命令脚本文件( run.cmd )。与 run.sh 相似,使用下面的命令通过%PROG_HOME%/bin类启动内核。

SparkSubmit.scala
  1. %JAVA_HOME%\bin\java -cp "%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar;." -Dscala.usejavacp=true org.apache.spark.deploy.SparkSubmit %SPARK_OPTS% --class org.apache.toree.Main %PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar %TOREE_OPTS% %* 文件夹中 kernel.json 文件中的argv参数值从 run.sh 更新为 run .cmd

  2. 启动Anaconda提示。运行PROG_HOME命令。从浏览器中选择“ Apache Toree-Scala”内核。您可以在Anaconda提示符下查看内核连接状态。

答案 1 :(得分:0)

想添加到上述@UmeshD的答案中 如果您正在使用toree-assembly-0.3.0-incubating,请创建文件run.cmd(在C:\Users\{Account_Name}\AppData\Roaming\jupyter\kernels\apache_toree_scala/bin/中),并将以下代码粘贴到下面

@echo off
pushd "%~dp0\..\"
set PROG_HOME=%cd%
popd

if not defined SPARK_HOME ( 
    echo "SPARK_HOME must be set to the location of a Spark distribution!"
    Exit /b
)

echo "Starting Spark Kernel with SPARK_HOME=%SPARK_HOME%"


rem for /f %%i in ('dir /B toree-assembly-*.jar') do set KERNEL_ASSEMBLY=%%i popd

rem disable randomized hash for string in Python 3.3+
set PYTHONHASHSEED=0
rem set TOREE_ASSEMBLY=%PROG_HOME%/lib/%KERNEL_ASSEMBLY%
rem The SPARK_OPTS values during installation are stored in __TOREE_SPARK_OPTS__. This allows values to be specified during
rem install, but also during runtime. The runtime options take precedence over the install options.
if not defined SPARK_OPTS (
    if defined __TOREE_SPARK_OPTS__ (
        set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
    )
)

if not defined TOREE_OPTS (
    if defined __TOREE_SPARK_OPTS__ (
        set TOREE_OPTS=%__TOREE_OPTS__%
    )
)

%JAVA_HOME%\bin\java -cp "%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar;." -Dscala.usejavacp=true org.apache.spark.deploy.SparkSubmit %SPARK_OPTS% --class org.apache.toree.Main %PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar %TOREE_OPTS% %*