如何将Jupyter笔记本scala内核与apache spark集成?

时间:2018-03-29 02:58:52

标签: scala apache-spark jupyter-notebook jupyter-scala

我已根据此文档安装了Scala内核:https://github.com/jupyter-scala/jupyter-scala 内核就在那里:

$ jupyter kernelspec list
Available kernels:
  python3     /usr/local/homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel/resources
  scala       /Users/bobyfarell/Library/Jupyter/kernels/scala

当我尝试在笔记本中使用Spark时,我得到了这个:

val sparkHome = "/opt/spark-2.3.0-bin-hadoop2.7"
val scalaVersion = scala.util.Properties.versionNumberString
import org.apache.spark.ml.Pipeline

Compilation Failed
Main.scala:57: object apache is not a member of package org
 ; import org.apache.spark.ml.Pipeline
              ^

我试过了:

  • 将SPARK_HOME和CLASSPATH设置为$ SPARK_HOME / jars的位置
  • 设置-cp选项指向kernel.json中的$ SPARK_HOME / jars
  • 在导入之前设置classpath.add调用

这些都没有帮助。请注意我不想使用Toree,我想在Jupyter中使用独立的spark和Scala内核。此处也报告了类似的问题:https://github.com/jupyter-scala/jupyter-scala/issues/63

1 个答案:

答案 0 :(得分:1)

看起来你并没有遵循jupyter-scala directions for using Spark。您必须使用特殊导入将spark加载到内核中。