Jupyter上的pyspark内核生成“找不到火花”错误

时间:2018-11-17 23:31:06

标签: apache-spark pyspark jupyter-notebook

我有几本pyspark内核jupyter笔记本电脑已经使用了几个月,但是最近不再工作了。 pyspark内核本身正在工作:它显示蓝色消息:

    Kernel Loaded

..我们可以看到内核是可用

enter image description here

但是我在jupyter日志中注意到了这一点:

  

[IPKernelApp]警告|处理PYTHONSTARTUP文件/shared/spark/python/pyspark/shell.py时发生未知错误:

当尝试在spark中进行某些工作时,我们得到:

---> 18     df = spark.read.parquet(path)
     19     if count: p(tname + ": count="+str(df.count()))
     20     df.createOrReplaceTempView(tname)

NameError: name 'spark' is not defined

没有更多信息。

注意:使用scala的{​​{1}} spark内核能够成功通过镶木地板读取相同的文件(并实际使用相同的代码)

那么toree内核会发生什么?

1 个答案:

答案 0 :(得分:0)

知道了!我已经升级了spark,而pyspark内核对此一无所知。

首先:安装了kernels

$jupyter kernelspec list

Available kernels:
  python2        /Users/sboesch/Library/Python/2.7/lib/python/site-packages/ipykernel/resources
  ir             /Users/sboesch/Library/Jupyter/kernels/ir
  julia-1.0      /Users/sboesch/Library/Jupyter/kernels/julia-1.0
  scala          /Users/sboesch/Library/Jupyter/kernels/scala
  scijava        /Users/sboesch/Library/Jupyter/kernels/scijava
  pyspark        /usr/local/share/jupyter/kernels/pyspark
  spark_scala    /usr/local/share/jupyter/kernels/spark_scala

我们来研究pyspark内核:

sudo vim  /usr/local/share/jupyter/kernels/pyspark/kernel.json

特别感兴趣的是spark jar文件:

PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"

有空吗?

$ll "/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"
ls: /shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip: No such file or directory

不,不是-因此,让我们更新该路径:

 $ll /shared/spark/python/lib/py4j*
-rw-r--r--@ 1 sboesch  wheel  42437 Jun  1 13:49 /shared/spark/python/lib/py4j-0.10.7-src.zip


PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.7-src.zip"

此后,我重新启动了jupyter并且pyspark内核正在工作。