我正在使用Apache Toree-PySpark运行Jupyter(v4.2.1)。当我尝试调用plotly的init_notebook_mode函数时,我遇到以下错误:
import numpy as np
import pandas as pd
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode()
错误:
Name: org.apache.toree.interpreter.broker.BrokerException
Message: Traceback (most recent call last):
File "/tmp/kernel-PySpark-6415c581-01c4-4c90-b4d9-81773c2bc03f/pyspark_runner.py", line 134, in <module>
eval(compiled_code)
File "<string>", line 7, in <module>
File "/usr/local/lib/python3.4/dist-packages/plotly/offline/offline.py", line 151, in init_notebook_mode
display(HTML(script_inject))
File "/usr/local/lib/python3.4/dist-packages/IPython/core/display.py", line 158, in display
format = InteractiveShell.instance().display_formatter.format
File "/usr/local/lib/python3.4/dist-packages/traitlets/config/configurable.py", line 412, in instance
inst = cls(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 499, in __init__
self.init_io()
File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 658, in init_io
io.stdout = io.IOStream(sys.stdout)
File "/usr/local/lib/python3.4/dist-packages/IPython/utils/io.py", line 34, in __init__
raise ValueError("fallback required, but not specified")
ValueError: fallback required, but not specified
StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140)
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140)
scala.Option.foreach(Option.scala:236)
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:139)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
py4j.Gateway.invoke(Gateway.java:259)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:209)
java.lang.Thread.run(Thread.java:745)
我无法在网上找到有关此内容的任何信息。当我深入研究失败的代码 - 在IPython utils中的io.py时,我看到传递的流必须同时具有属性 - 写入和刷新。但由于某种原因,在这种情况下传递的流 - sys.stdout只有“write”属性,而不是“flush”属性。
答案 0 :(得分:0)
我相信这是因为plotly的笔记本模式假设它在一个执行笔记本通信的IPython jupyter内核中运行;你在stacktrace中看到它试图调用IPython包。
但是,Toree是一个不同的jupyter内核,它有自己的协议处理来与笔记本服务器通信。即使你使用toree来运行PySpark解释器,你也会得到一个“普通的”PySpark(就像你从shell启动它一样)并且toree驱动那个解释器的输入/输出。所以没有设置IPython机器,并且在该环境中调用init_notebook_mode()会失败,就像你在直接从shell启动的PySpark中运行一样,它对笔记本一无所知。
据我所知,目前无法通过toree运行PySpark会话的输出 - 我们最近遇到了同样的问题。您需要运行IPython内核,在那里导入PySpark库并连接到Spark集群,而不是通过toree运行python。有关停靠的示例,请参阅https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook。