从检查点恢复时,mapWithState具有initialState报告错误

时间:2019-06-11 11:10:44

标签: scala spark-streaming

我基于StatefulNetworkWordCount.scala编写了一个有关“ RecoverableStatefulNetworkWordCount”的火花流作业!和RecoverableNetworkWordCount.scala

驱动程序重新启动时报告异常: “(2)当Spark Streaming作业从检查点恢复时,如果在DStream操作中使用对未由Streaming作业定义的RDD的引用,则会发生此异常。有关更多信息,请参阅SPARK-13​​758。

如果我删除.initialState(initialRDD),它将正常运行并重新启动。

import org.apache.spark._
import org.apache.spark.streaming._

object SimpleApp {
    def main(args: Array[String]) {
        val checkpointDir = if(args.length >= 1) "hdfs://10.2.35.117:9000/spark-cp" else "./spark-cp"

        def functionToCreateContext(): StreamingContext = {
            val conf = new SparkConf().setMaster(if(args.length >= 1) args(0) else "local[2]").setAppName("NetworkWordCount")

            val ssc = new StreamingContext(conf, Seconds(2))   // new context
            ssc.checkpoint(checkpointDir)

            val lines = ssc.socketTextStream("10.2.35.117", 9999) // create DStreams     
            val words = lines.flatMap(_.split(" "))
            val pairs = words.map((_, 1))

            val initialRDD = ssc.sparkContext.parallelize(List(("hello", 1), ("world", 1)))

            val mappingFunc = (word: String, one: Option[Int], state: State[Int]) => {
                val sum = one.getOrElse(0) + state.getOption.getOrElse(0)
                val output = (word, sum)
                state.update(sum)
                output
            }
            val stateCounter = pairs.mapWithState(StateSpec.function(mappingFunc))

            stateCounter.print

            ssc
        }
        val ssc = StreamingContext.getOrCreate(checkpointDir, functionToCreateContext _)
        val sc = ssc.sparkContext
        sc.setLogLevel("WARN")

        ssc.start()             
        ssc.awaitTermination()   
    }
}
./spark-submit --class SimpleApp /data/scala-test/hw/target/scala-2.11/hello_2.11-0.0.1-SNAPSHOT.jar

我的问题是初始化状态的正确方法是什么?

0 个答案:

没有答案