Question

有很多关于Spark 2.0支持多个SparkContext的讨论。支持它的配置变量已存在很长时间但实际上并没有效果。

在$SPARK_HOME/conf/spark-defaults.conf：

spark.driver.allowMultipleContexts true

让我们确认财产得到承认：

scala>     println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
allowMultiCtx = true

这是一个小型的poc程序：

import org.apache.spark._
import org.apache.spark.streaming._
println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
def createAndStartFileStream(dir: String) = {
  val sc = new SparkContext("local[1]",s"Spark-$dir" /*,conf*/)
  val ssc = new StreamingContext(sc, Seconds(4))
  val dstream = ssc.textFileStream(dir)
  val valuesCounts = dstream.countByValue()
  ssc.start
  ssc.awaitTermination
}
val dirs = Seq("data10m", "data50m", "dataSmall").map { d =>
  s"/shared/demo/data/$d"
}
dirs.foreach{ d =>
  createAndStartFileStream(d)
}

但尝试使用当该功能未成功时：

16/08/14 11:38:55 WARN SparkContext: Multiple running SparkContexts detected 
in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in
this JVM (see SPARK-2243). To ignore this error, 
set spark.driver.allowMultipleContexts = true. 
The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:814)
org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)

任何人都对如何使用多个上下文有任何见解？

Answer 1

Per @LostInOverflow此功能无法修复。这是来自jira的信息

SPARK-2243在同一个JVM中支持多个SparkContexts

https://issues.apache.org/jira/browse/SPARK-2243

Sean Owen发表评论 - 16 / Jan / 16 17:35你说你很担心过度利用群集来完成不需要的步骤资源。这就是动态分配的目的：数量执行器随负载增加和减少。如果已经有一个上下文使用所有群集资源，是的，没有做任何事情。但是之后，第二种背景也没有;群集已经完全使用。一世不知道你指的是什么开销，但肯定是一个上下文运行N个作业比运行N个作业的N个上下文更繁忙。它的开销较高，但总开销较低。这更像是一个效果比使你选择一个架构的原因另一个。通常，Spark每个JVM和I总是假定一个上下文没有看到变化，这就是为什么我最终关闭了这个。我没有看到任何支持实现这一目标。

在Spark 2.0中实际上是否解除了单个SparkContext的限制？

1 个答案: