一段时间以来,我一直遇到Zeppelin的问题,该问题似乎无法启动IPython。我遵循了这个guide和这个one。使用正确的python路径正确设置了Pyspark解释器,并且默认情况下激活了IPython。但是,当我尝试运行指南中的任何示例时,例如:
%ipyspark
import pandas as pd
df = pd.DataFrame({'name':['a','b','c'], 'count':[12,24,18]})
z.show(df)
我从日志中得到了以下错误信息:
INFO [2018-11-30 15:17:08,653] ({pool-3-thread-2} IPythonInterpreter.java[setAdditionalPythonPath]:103) - setAdditionalPythonPath: /usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python
INFO [2018-11-30 15:17:08,654] ({pool-3-thread-2} IPythonInterpreter.java[open]:135) - Python Exec: python3
INFO [2018-11-30 15:17:09,189] ({pool-3-thread-2} IPythonInterpreter.java[checkIPythonPrerequisite]:195) - IPython prerequisite is meet
INFO [2018-11-30 15:17:09,191] ({pool-3-thread-2} IPythonInterpreter.java[open]:146) - Launching IPython Kernel at port: 39753
INFO [2018-11-30 15:17:09,191] ({pool-3-thread-2} IPythonInterpreter.java[open]:147) - Launching JVM Gateway at port: 36511
INFO [2018-11-30 15:17:09,402] ({pool-3-thread-2} IPythonInterpreter.java[setupIPythonEnv]:315) - PYTHONPATH:/usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python:/usr/hdp/current/spark2-client//python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client//python/:/usr/hdp/current/spark2-client//python:/usr/hdp/current/spark2-client//python/lib/py4j-0.8.2.1-src.zip
INFO [2018-11-30 15:17:09,743] ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO [2018-11-30 15:17:09,844] ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
WARN [2018-11-30 15:17:09,926] ({Exec Default Executor} IPythonInterpreter.java[onProcessFailed]:394) - Exception happens in Python Process
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
at java.lang.Thread.run(Thread.java:745)
INFO [2018-11-30 15:17:09,944] ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO [2018-11-30 15:17:10,044] ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO [2018-11-30 15:17:39,465] ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
WARN [2018-11-30 15:17:39,466] ({pool-3-thread-2} PySparkInterpreter.java[open]:134) - Fail to open IPySparkInterpreter
java.lang.RuntimeException: Fail to open IPythonInterpreter
at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:157)
at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66)
at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:129)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Fail to launch IPython Kernel in 30 seconds
at org.apache.zeppelin.python.IPythonInterpreter.launchIPythonKernel(IPythonInterpreter.java:297)
at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:154)
at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO [2018-11-30 15:17:39,466] ({pool-3-thread-2} PySparkInterpreter.java[open]:140) - IPython is not available, use the native PySparkInterpreter
INFO [2018-11-30 15:17:39,533] ({pool-3-thread-2} PySparkInterpreter.java[createPythonScript]:118) - File /tmp/zeppelin_pyspark-5362368451576072994.py created
INFO [2018-11-30 15:17:39,534] ({pool-3-thread-2} Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at 127.0.0.1:34508
INFO [2018-11-30 15:17:39,565] ({pool-3-thread-2} PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - pythonExec: python3
INFO [2018-11-30 15:17:39,567] ({pool-3-thread-2} PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH: /usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python:/usr/hdp/current/spark2-client//python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client//python/:/usr/hdp/current/spark2-client//python:/usr/hdp/current/spark2-client//python/lib/py4j-0.8.2.1-src.zip
INFO [2018-11-30 15:17:41,953] ({pool-3-thread-2} SchedulerFactory.java[jobFinished]:115) - Job 20181129-172919_2135817500 finished by scheduler interpreter_131607019
我正在使用Zeppelin 0.8.0随附的HDP3.0.1。所有节点都安装了python 3.7.1,并安装了最新版本的jupyter和grpcio。在齐柏林飞艇笔记本上,我检查了ipython和python版本:
%pyspark
import sys
import IPython
print(IPython.__version__)
print(sys.version)
7.2.0
3.7.1(默认值,2018年11月29日,17:37:37)
我可以从任何节点无问题地启动IPython,Zeppelin可以正确获取IPython的版本。我试图查找是否有除Zeppelin报告错误以外的其他日志,但找不到任何东西。
有什么想法可以阻止Zeppelin启动IPython内核?
答案 0 :(得分:1)
pip install --upgrade setuptools pip
或潜在地
pip install --upgrade ipython
还有一些其他快速的方法可以在这里github.com/jupyter/notebook/issues/270
尝试。答案 1 :(得分:0)
我在运行 zeppelin-0.9.0-preview2 时也遇到了这个问题。就我而言,原因是 zeppelin 无法识别 pip-freeze
中的 conda 安装包。
例如,我使用 jupyter-client
安装了 conda
,所以 pip freeze
看起来像这样:
➜ pip freeze | grep jupyter-client
jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1616770841739/work
注意 jupyter-client 不遵循 package==version
格式。解决方案是删除 jupyter 的 conda 安装版本并使用 pip 安装它。
➜ conda uninstall jupyter
➜ pip install jupyter
➜ pip freeze | grep jupyter-client
jupyter-client==6.1.12
似乎 zeppelin 应该很快就会支持 both cases。 (如果还没有)。
还要记住,zeppelin-0.9.0 尚不支持 Python 3.8。