Question

我创建了一个SparkSession（带有enabledHiveSupport（））并在本地运行。我想在一个SparkSession中执行一组sql，停止它并启动另一个。

但是当我停止SparkSession并使用SparkSession.builder（）获取一个新的时，我得到一个新的SparkSession对象，但是sql失败了＆＃34;另一个Derby实例可能已经启动了数据库..＆＃34;

因为我们每个JVM只能有一个SparkContext，这是否意味着我不能getOrCreate SparkSession，停止并重复？

有没有办法每次在新会话中执行一组sql？（我知道有SparkSession.newSession，虽然我也不能停止那个会话，因为底层的共享SparkContext会停止，对吗？）

Answer 1

您好，您可以根据官方文档使用SparkSession.newSession原因

<强> SparkSession： 使用隔离的SQL配置启动新会话，临时表，已注册的函数被隔离，但共享底层的SparkContext和缓存数据。

Note
    Other than the SparkContext,all shared state is initialized lazily. This method will 
    force the initialization of the shared state to ensure that parent and child sessions 
    are set up with the same shared state. If the underlying catalog implementation is 
    Hive, this will initialize the metastore, which may take some time.

示例代码，您可以如何使用多个spark会话

object WorkimngWithNewSession {
  def main(args: Array[String]): Unit = {

    // creating a new session
    val spark = SparkSession
                .builder()
                .appName("understanding session")
                .master("local[*]")
                .getOrCreate()

    import spark.implicits._
    val df = Seq("name","apple").toDF()

    df.createOrReplaceTempView("testTable") // do not call this in case of multiple spark session and if u are using this view in all of them.
    df.createOrReplaceGlobalTempView("testTable") // call this for multiple spark session

    spark.sql("SELECT * FROM testTable").show()
    // spark.stop()  // do not call this as it will stop sparkContext

    val newSpark = spark.newSession()
    newSpark.sql("SELECT * FROM global_temp.testTable").show() // call global view by using global_temp


    spark.stop() // if u want u can call this line in the end to close all spark session
  }
}

停止前一个后，无法使用新的SparkSession

1 个答案: