Question

以下2之间的区别是什么？

object Example1 {
    def main(args: Array[String]): Unit = {
        try {
            val spark = SparkSession.builder.getOrCreate
            // spark code here
        } finally {
            spark.close
        }
    }
}

object Example2 {
    val spark = SparkSession.builder.getOrCreate
    def main(args: Array[String]): Unit = {
        // spark code here
    }
}

我知道SparkSession实现了Closeable，它暗示它需要关闭。但是，如果SparkSession刚刚在Example2中创建并且从不直接关闭，我就无法想到任何问题。如果Spark应用程序失败（并从main方法退出）成功，JVM将终止，SparkSession将随之消失。它是否正确？ IMO：SparkSession是单例的事实也不应该产生很大的不同。

Answer 1

当您完成其使用时，始终关闭SparkSession（即使最终结果只是以遵循良好的回馈做法你得到了什么。）

关闭SparkSession可能会触发释放可能提供给其他应用程序的群集资源。

SparkSession是一个会话，因此维护一些消耗JVM内存的资源。您可以拥有任意数量的SparkSession（请参阅SparkSession.newSession重新创建会话），但如果您不使用内存，则不希望它们使用它们不应该使用的close一个你不再需要的。

SparkSession是Spark SQL围绕Spark Core SparkContext的包装器，因此在封面下（如在任何Spark应用程序中），您将拥有分配给{{1}的群集资源，即vcores和内存。（通过SparkSession）。这意味着只要您的SparkContext正在使用（使用SparkContext），集群资源就不会分配给其他任务（不一定是Spark，也适用于提交给集群的其他非Spark应用程序））。这些群集资源是你的，直到你说“我已经完成”，这转化为...... SparkSession。

但是，如果在close之后，只需退出Spark应用程序，就不必考虑执行close，因为无论如何都会自动关闭资源。驱动程序和执行程序的JVM终止，集群的（心跳）连接也终止，因此最终将资源返回给集群管理器，以便它可以提供给其他应用程序使用。

Answer 2

两者都一样！

Spark会话的stop / close最终称为spark上下文的stop

def stop(): Unit = {
  sparkContext.stop()
}

override def close(): Unit = stop()

Spark上下文已运行时间shutdown hook，以在退出JVM之前关闭spark上下文。请在下面找到用于在创建上下文时添加关闭挂钩的火花代码

ShutdownHookManager.addShutdownHook(
  _shutdownHookRef = ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () =>
  logInfo("Invoking stop() from shutdown hook")
  stop()
}

因此，无论JVM如何退出，都将调用它。如果您手动stop()，此关闭挂钩将被取消以避免重复

def stop(): Unit = {
  if (LiveListenerBus.withinListenerThread.value) {
    throw new SparkException(
      s"Cannot stop SparkContext within listener thread of ${LiveListenerBus.name}")
  }
  // Use the stopping variable to ensure no contention for the stop scenario.
  // Still track the stopped variable for use elsewhere in the code.
  if (!stopped.compareAndSet(false, true)) {
    logInfo("SparkContext already stopped.")
    return
  }
  if (_shutdownHookRef != null) {
    ShutdownHookManager.removeShutdownHook(_shutdownHookRef)
  }

如果SparkSession没有关闭会发生什么？

2 个答案: