Spark会话空指针与检查点

时间:2017-09-13 00:47:58

标签: scala apache-spark spark-streaming checkpointing

我启用了将日志保存到S3的检查点。 如果检查点目录中没有NO文件,则spark streaming工作正常,我可以看到检查点目录中出现的日志文件。然后我杀死火花流并重启它。这次,我开始为spark会话获取NullPointerException。 简而言之,如果检查点目录中没有日志文件,则spark streaming工作正常。但是,只要我在检查点目录中重新启动带有日志文件的spark streaming,就会在spark会话中开始获取空指针异常。 以下是代码:

object asf {
  val microBatchInterval = 5
  val sparkSession = SparkSession
    .builder()
    .appName("Streaming")
    .getOrCreate()

    val conf = new SparkConf(true)
    //conf.set("spark.streaming.receiver.writeAheadLog.enable", "true")
    val sparkContext = SparkContext.getOrCreate(conf)


  val checkpointDirectory = "s3a://bucketname/streaming-checkpoint"

  println("Spark session: " + sparkSession)

  val ssc = StreamingContext.getOrCreate(checkpointDirectory,
    () => {
      createStreamingContext(sparkContext, microBatchInterval, checkpointDirectory, sparkSession)
    }, s3Config.getConfig())

  ssc.start()
  ssc.awaitTermination()
}

  def createStreamingContext(sparkContext: SparkContext, microBatchInterval: Int, checkpointDirectory: String,spark:SparkSession): StreamingContext = {
    println("Spark session inside: " + spark)
    val ssc: org.apache.spark.streaming.StreamingContext = new StreamingContext(sparkContext, Seconds(microBatchInterval))
    //TODO: StorageLevel.MEMORY_AND_DISK_SER
    val lines = ssc.receiverStream(new EventHubClient(StorageLevel.MEMORY_AND_DISK_SER);
    lines.foreachRDD {
      rdd => {
        val df = spark.read.json(rdd)
        df.show()
      }
    }
    ssc.checkpoint(checkpointDirectory)
    ssc
  }
}  

同样,我第一次运行此代码(检查点目录中没有日志文件),我可以看到正在打印的数据框。 如果我在检查点目录中运行日志文件,我甚至都看不到

println("Spark session inside: " + spark)

打印并且第一次打印。错误:

Exception in thread "main" java.lang.NullPointerException
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:111)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
    at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:549)
    at org.apache.spark.sql.SparkSession.read(SparkSession.scala:605)

错误发生在:

val df = spark.read.json(rdd)

编辑:我添加了这一行:

conf.set("spark.streaming.stopGracefullyOnShutdown","true")

它仍然没有什么区别,仍然得到NullPointerException。

2 个答案:

答案 0 :(得分:1)

要回答我自己的问题,这有效:

import UIKit

class ViewController: UIViewController {

    var array: [UIImage] = [UIImage(named: "1.png")!,
                            UIImage(named: "2.png")!,
                            UIImage(named: "3.png")!,
                            UIImage(named: "4.png")!,
                            UIImage(named: "5.png")!]



    @IBOutlet weak var myButton: UIButton!
    var pressed: Bool = false;
    var score: Int = 0;


    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.

        //2nd Condition: if button did not press for 2 seconds will appear down
        if(pressed == false){
            Timer.scheduledTimer(timeInterval: 0.2, target: self, selector: #selector(ViewController.doSomething), userInfo: nil, repeats: true)
        }


    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }

    @IBAction func isPressed(_ sender: UIButton) {

        pressed = true;

        var randomNum: UInt32 = 10
        randomNum = arc4random_uniform(UInt32(array.count));

        myButton.setImage(UIImage(named: "\(randomNum).png"), for: .normal)

        // 1st Condition: if pressed and the image appeared is not same as the UIIMage view down score will be added
        if( UIImage(named: "\(randomNum).png") != myButton.currentImage){

            //Add View Down Score
            score += 1;
        }
        // 3rd Condition: if button pressed and it is same as the UIimage so game over
        else{

            // Game Over
        }



    }

    func doSomething() {
        // Not sure what you mean by will appear down
        print("Action")
    }

}

传递从rdd.sparkContext构建的spark会话

答案 1 :(得分:0)

只是为了新手的利益而明确地说,这是一种反模式。不允许在转换中创建数据集!

正如Michel提到的那样,执行者将无法访问SparkSession