我有几本pyspark
内核jupyter
笔记本电脑已经使用了几个月,但是最近不再工作了。 pyspark
内核本身正在工作:它显示蓝色消息:
Kernel Loaded
..我们可以看到内核是可用:
但是我在jupyter
日志中注意到了这一点:
[IPKernelApp]警告|处理PYTHONSTARTUP文件/shared/spark/python/pyspark/shell.py时发生未知错误:
当尝试在spark
中进行某些工作时,我们得到:
---> 18 df = spark.read.parquet(path)
19 if count: p(tname + ": count="+str(df.count()))
20 df.createOrReplaceTempView(tname)
NameError: name 'spark' is not defined
没有更多信息。
注意:使用scala
的{{1}} spark内核能够成功通过镶木地板读取相同的文件(并实际使用相同的代码)
那么toree
内核会发生什么?
答案 0 :(得分:0)
知道了!我已经升级了spark
,而pyspark
内核对此一无所知。
首先:安装了kernels
:
$jupyter kernelspec list
Available kernels:
python2 /Users/sboesch/Library/Python/2.7/lib/python/site-packages/ipykernel/resources
ir /Users/sboesch/Library/Jupyter/kernels/ir
julia-1.0 /Users/sboesch/Library/Jupyter/kernels/julia-1.0
scala /Users/sboesch/Library/Jupyter/kernels/scala
scijava /Users/sboesch/Library/Jupyter/kernels/scijava
pyspark /usr/local/share/jupyter/kernels/pyspark
spark_scala /usr/local/share/jupyter/kernels/spark_scala
我们来研究pyspark
内核:
sudo vim /usr/local/share/jupyter/kernels/pyspark/kernel.json
特别感兴趣的是spark
jar文件:
PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"
有空吗?
$ll "/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"
ls: /shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip: No such file or directory
不,不是-因此,让我们更新该路径:
$ll /shared/spark/python/lib/py4j*
-rw-r--r--@ 1 sboesch wheel 42437 Jun 1 13:49 /shared/spark/python/lib/py4j-0.10.7-src.zip
PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.7-src.zip"
此后,我重新启动了jupyter
并且pyspark
内核正在工作。